I have no problem downloading and using programs written by other people. I
have not used one of Austin's programs, but it is nice to know that it's there
if I need it. Could I write a program on regression discontinuity from scatch?
Sure. Give me two hours. But why would I? I am very grateful to those who have
shared their program with me, and I hope they find my programs useful too.
When a topic repeatedly come up on this list, it indicates an unaddressed
problem. Distance matcing is a topic of growing importance that is appearing
on this list with increasing frequency, despite the earlier assurance that
this is not a problem in search of a solution.
The solution implemented in -distmatch- is simple yet has never been implemented
before. It is a non-intensive solution to an intensive problem. The number of
matches that must be considered is N x N (it's actually N choose 2 repleated N
or N-1 times, depending on how you count). This is a large number.
The problem with proposing a simple solution is that some people like to entertain
themselves thinking they could have done it on their own.
But it has not been done before, and the problem keeps coming back to this list,
as I said before.
How difficult is distance matching? My first stab was remarkably similar to another
program called -nearest- from ssc, which according to Nick Cox was not meant to
earn a good grade in any computer science course. I am guessing this is the most
obvious solution because this is also the one that Austin was suggesting.
My first program literally took several months to run with observations of about
30,000. I tried paralleling the codes (multiple computers), -merge-, grid-searching,
etc, before settling on the current form, which is at least 100 times faster than the
first one.
This rewriting of the program occurred over the course of two years. If
someone can do this in one sitting, go ahead. Good for them.
But anyone thinking that a casual user can be shown how to do this over the
Statalist is wasting everyone's time, which was clearly the case.
The current non-Stata solution, widely used by economists, is to use ArcGIS or
ArcMap. They cost about $2000-$6000. They usually take about several days if
not weeks of user-work. If you are using confidential data center (they usually
charge by the hour), that's another $2000 in expenses. Good luck using the latest
versions of these programs because they are even more difficult to use. Be grateful
if you never had to use one of these.
Roy
> Laura--
> You don't actually need to download anything to solve this kind of
> problem, or much harder similar problems, as illustrated by e.g.
> http://www.stata.com/statalist/archive/2009-07/msg00261.html
> http://www.stata.com/statalist/archive/2007-01/msg00098.html
> and similar posts.
>
> I particularly doubt the final claim in the help file for -distmatch-
> in the paragraph "Distance matching is computationally intensive.
> Observations of 3,000 may take several minutes to complete. Other
> methods typically take days if not weeks and requires extensive
> user-involvement."
>
> use farms, clear
> local nf=_N
> g double mindist=.
> merge using waterbodies
> local R=6367.44
> qui forv i=1/`nf' {
> local x1=farm_Y[`i']
> local y1=farm_X[`i']
> local x2 wat_Y
> local y2 wat_X
> g double L=(`y2'-`y1')*_pi/180
> replace L=(`y2'-`y1'-360)*_pi/180 if L_pi
> replace L=(`y2'-`y1'+360)*_pi/180 if L<-_pi
> local t1 acos(sin(`x2'*_pi/180)*sin(`x1'*_pi/180)
> g double d=`t1'+cos(`x2'*_pi/180)*cos(`x1'*_pi/180)*cos(L))*`R'
> su d, meanonly
> replace mindist=r(min) in `i'
> drop L d
> }
> drop _m waterbody_ID wat_X wat_Y
> la var mindist "Distance to center of nearest body of water"
> or adapt as appropriate...
_________________________________________________________________
Hotmail® is up to 70% faster. Now good news travels really fast.
http://windowslive.com/online/hotmail?ocid=PID23391::T:WLMTAGL:ON:WL:en-US:WM_HYGN_faster:082009
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/