Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Austin Nichols <austinnichols@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: looping with geodist |
Date | Fri, 7 May 2010 11:23:45 -0400 |
Frederick Guy <f.guy@bbk.ac.uk> : There are numerous examples addressing your need in the Archives, e.g.: http://www.stata.com/statalist/archive/2009-09/msg00473.html http://www.stata.com/statalist/archive/2009-07/msg00261.html http://www.stata.com/statalist/archive/2007-01/msg00098.html Note also the calculation of distance (using an approximation that assumes the Earth is a sphere; see -vincenty- on SSC for an alternative) between two points on Earth measured in decimal degrees lat/lon occupies a large fraction of msg00473's code, but need not; all the calculations could be telescoped into one line (it's just easier to break it up), and the local macros are mostly unnecessary. Plus, the formula in that message is the weakest of many alternatives for great-circle distance; see e.g. http://en.wikipedia.org/wiki/Great-circle_distance (but downloading a package for any of those spherical approximate computations seems like overkill). As far as I know, the unmatched merge approach was first promulgated in January 2007 (see e.g. http://www.stata.com/statalist/archive/2007-01/msg00082.html but the name came later; the approach was developed in 2003 for a paper published 2009 in the JHE--see also Appendix A of http://www.nber.org/papers/w13246 if you are interested in inverse distance weights) as a way to have two datasets in memory at once; another way is to repeatedly merge or append a second dataset onto a single observation from the first, but this is understandably less efficient. The crucial detail to remember with an unmatched merge strategy (merging on _n rather than any variables) is that all the variable names must be distinct across the two datasets. Suppose your location variables are xi,yi,xj,yj and you have Ni obs of type i and Nj obs of type j. If you want distances to each location of type j stored on the type i obs, you will need Nj new variables to store distances; if you only want summary stats across locations of type j, you should not create that many new variables at once, to conserve memory. Suppose you want the weighted sum of inverse distances (assuming none are zero); then you could just: use type_i, clear local Ni=_N merge using type_j g w=. qui forv i=1/`Ni' { g double L=(yj-yi[`i'])*_pi/180 replace L=(yj-yi[`i']-360)*_pi/180 if L<. & L>_pi replace L=(yj-yi[`i']+360)*_pi/180 if L<-_pi local t1 acos(sin(xj*_pi/180)*sin(xi[`i']*_pi/180) g i=1/(`t1'+cos(xj*_pi/180)*cos(xi[`i']*_pi/180)*cos(L))*6367.44) su i, meanonly replace w=r(min) in `i' drop L i } la var w "Sum of Inverse (Approx) Distances" On Fri, May 7, 2010 at 4:35 AM, Frederick Guy <f.guy@bbk.ac.uk> wrote: > Robert Picard sent the code below, which works as advertised - many thanks, Robert! Now I have a slightly different problem: I have two kinds of locations in the data, i and j. For each location of type i, I need to compute the distances to every location of type j. If I just stack observations type i on top of observations type j, geodist doesn't like the missing values (observations type i have missing values for type j, and vice versa). Can anybody suggest a solution? > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Robert Picard > Sent: 30 April 2010 17:09 > To: statalist@hsphsun2.harvard.edu > Subject: Re: st: RE: RE: RE: AW: Creating index relative to other observations > > Perhaps the following example is close to what you are trying to do. > It loops through all observations. Each time, it calculates the > distance from observation `i' to all others (distance will be missing > for the observation `i'). Values for variable x1 are adjusted > according to the distance to `i' and summed. The observation `i' of x3 > is then updated with the value of the sum plus the value of x2 for > observation `i'. > > Hope this helps, > > Robert > http://robertpicard.com/ > > *--------------------------- begin example ----------------------- > version 11 > > * This example require my -geodist- program available on SSC > * To install: ssc install geodist > > clear all > set obs 5 > set seed 1234 > gen lat = 37 + (41 - 37) * uniform() > gen lon = -109 + (109 - 102) * uniform() > gen x1 = round(uniform()*100) > gen x2 = round(uniform()*100) > gen x3 = . > > forvalues i = 1/`c(N)' { > geodist lat lon `=lat[`i']' `=lon[`i']' if _n != `i', gen(d) > gen xtemp = x1 / d > sum xtemp, meanonly > qui replace x3 = r(sum) + x2 in `i' > list > drop d xtemp > } > *--------------------- end example -------------------------- > > > On Fri, Apr 30, 2010 at 7:49 AM, Frederick Guy <f.guy@bbk.ac.uk> wrote: >> Many thanks. Now for a crash-course in MATA... >> >> -----Original Message----- >> From: owner-statalist@hsphsun2.harvard.edu >> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox >> Sent: 29 April 2010 19:22 >> To: statalist@hsphsun2.harvard.edu >> Subject: st: RE: RE: AW: Creating index relative to other observations >> >> I'd do this in Mata. Mata has a -for- loop. >> >> Nick >> n.j.cox@durham.ac.uk >> >> Frederick Guy >> >> Thanks, I guess I was unclear on this aspect of the problem. For each >> observation, the sum I'm talking about is of measurements made relative >> to all other observations (or more generally, to some set of other >> observations) in the sample. >> >> Martin Weiss >> >> ".. sum up the results of these computations,". >> >> Creating sums can mean different things in Stata. It may sound trite, >> but >> the easiest is simply to -generate- a sum by adding values with a "+" >> sign. >> If you want the total of a variable, look at -egen, total()-. If you >> want a >> running sum, take a look at -help sum()-. >> >> Frederick Guy >> >> I have need to use information from all observations (about 1800 of >> them) to create a new variable. >> >> The variable created is a weighted sum of the inverse of geographical >> distances between observation i and all j n.e. i. I have longitude and >> latitude for each observation, and computation of the distance from any >> i to any j is straightforward. What I don't know is how to get Stata to >> loop over all observation and sum up the results. >> >> For every observation i, I think I need to >> >> (a) loop through all j n.e. I, doing computations involving variables >> x1, x2(i) and x1, x2(j), and then >> >> (b) sum up the results of these computations, returning a value which >> becomes variable x3 for that i. >> >> I expect there's a straightforward way to do this. Any suggestions? >> * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/