Hi,
there might be another way.
I do not know if it is a more efficient and less time consuming way, but it
might work:
reshape the data set to
id_i id_j dist
1 1 0
1 2 23
1 3 21
...
1 2500 530
and so on.
Get the shortest distance
.by id_i, sort: egen mindist=min(dist) if dist>0
Now look for the station:
.gen helpvar=mindist-dist
which is zero for the closest station. Now you can make a small test first
and get the id (with a small way around):
.tab mindist (...this is the test)
.gen helpnear_id=id_j if helpvar==0
.replace helpnear_id=0 if helpnear_id==.
.by id_i, sort: egen near_id=max(helpnear_id)
.drop helpnear_id helpvar
Finally you might reshape again to get the result in a matrix.
However, I do not know if it is faster than 1.4 hours since reshape is a bit
more time consuming :-)
Stephan
---
Stephan Brunow
MSc. in Economics und Diplom-Verkehrswirtschaftler
Professur f�r VWL, insb. Makro�konomik und
Raumwirtschaftslehre/Regionalwissenschaften
Institut f�r Wirtschaft und Verkehr
Fakult�t f�r Verkehrswissenschaften �Friedrich List"
Technische Universit�t Dresden
D-01062 Dresden
http://tu-dresden.de/regionalscience
Phone: ++49-(0)351-463-36806
Fax: ++49-(0)351-463-36819
-----Urspr�ngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von Jitian Sheu
Gesendet: Dienstag, 6. Juni 2006 11:41
An: [email protected]
Betreff: st: More efficient way of programming
Dear listers:
I have a data set with the following structure:
id d1 d2 d3..... d2500 min_dis
1 0 23 21 530 21
2 23 0
3
4
5
...
(up to 2500)
i.e. number of observation=2500, and each one represent to one station(id)
dX= the distance to stationX, X=1...2500
(since there are 2500 observation,==> I have 2500 distance variables)
min_dis=minimum distance of the nearest station.
So, for each observation(station), I know its minimum distance to another
station.
Now, I want to know its nearest station id.
i.e. I want to have another variable (say called near_id). By this new
variable, I can then obtain the id number of each observation's nearest
station id.
For example (using the above data)
:
id d1 d2 d3..... d2500 min_dis ==> near_id
1 0 23 29 530 21 ==> 2
2 23 0 32 41 23 ==> 1
3 29 32 0 52 21 ==> 2
4
5
...
For this purpose, I use the following programming code.
Basically, I am doing this observation by observation:
gen near_id=.
forvalues i=1(1)2500{
forvalues j=1(1)2500{
replace near_id =`j' if id==`i'&
d`j'==min_dis
}
}
Therefore, there are totally 2500X2500 loops
If each loop takes 2 seconds==> totally, I need 5000 seconds to finish the
whole process, which is 1.4 hours.
Is there any efficient way to do that?
Many thanks.
JT
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/