Hi,
I have a problem with the reshape command which seems extremely slow.
My data file contains x and y coordinates for n different points, and I want
to calculate the distance between each point (using pythagora�s theorem).
For various reasons, I want to calculate n*n distances rather than
(n*(n-1))/2 unique distances.
The following loop simulates exactly the kind of data I have and then
calculates the problem (n points is here set to 10).
// SIMULATION
local n = 10
range point 1 `n' `n'
gen x = int(abs(uniform()*10000000))
gen y = int(abs(uniform()*10000000))
// CALCULATION
local rows = _N
forvalues n1 = 1/`rows' {
gen point_`n1' = point[`n1']
gen x_`n1' = x[`n1']
gen y_`n1' = y[`n1']
}
reshape long point_ x_ y_ , i(point) j(row)
gen xdiff = abs(x-x_)
gen ydiff = abs(y-y_)
gen distance = sqrt((xdiff^2)+(ydiff^2))
The problem is that I need to do this for 9230 points. Everything goes fine
(and fast) until the reshape command.
Then Stata seem to get stuck. The thing is that I run this on a
multiprocessor Windows Vista machine using StataMP 9.2. I have set the
memory to
25 GB which is more than sufficient. I have also set the number of maximum
variables to 32000. So it shouldn�t really be the machinery that fails.
I had the above loop with n = 9230 running for three days (72 hours!), but
the reshape command couldn�t complete within that time.
Does anyone have suggestions how the calculation could be run faster/without
using the reshape command? And why is reshape so tediously slow?
Thanks
Martin H�llsten
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/