We can keep the discussion here if others are interested. And you are
welcome.
Yes, that is what I mean by grid-search. There is no point in comparing
something that is not in the neighborhood.
The quickiest way I have found is the cut the map into overlapping grids.
This is an old code that was in process of revision (about a year old),
but it will do 2 million observations in about 5 minutes assuming random
dispersion of points.
In Laura's case, she just need to reverse the comparison group.
Roy
clear
set mem 1g
set obs 2000000
gen id=_n
set seed 123
gen longit=uniform()*10000
gen latit=uniform()*10000
sum
cap program drop distance
program define distance
syntax [using], kernel(real) RADius(real)
gen X=round(longit,`kernel')
gen Y=round(latit,`kernel')
tempfile file0 file1
gen markX=.
save `file0'
replace markX=0 if markX==.
append using `file0'
replace markX=1 if markX==.
append using `file0'
replace markX=2 if markX==.
gen markY=.
save `file1'
replace markY=3 if markY==.
append using `file1'
replace markY=4 if markY==.
append using `file1'
replace markY=5 if markY==.
*sort id markX markY
replace X=X-`kernel' if markX==1
replace X=X+`kernel' if markX==2
replace Y=Y-`kernel' if markY==4
replace Y=Y+`kernel' if markY==5
gen mark=0 if markX==0 & markY==3
egen XY=group(X Y)
bys XY: gen count=_N
keep if count>1
* drop if the middle part (mark==0) is not there
bys XY: egen min=min(mark)
drop if min~=0
drop min
* renumber
drop XY
egen XY=group(X Y)
*** re-sorting is necessary
sort XY
drop count
bys XY: gen count=_N
*gen row=_n
*bys XY: gen begin=row if _n==1
*bys XY: replace begin=begin[_n-1] if begin[_n-1]~=.
*bys XY: gen end0=row if _n==_N
*bys XY: egen end=max(end0)
*drop end0
forval num=1/10 {
gen match`num'=.
gen dist`num'=.
}
egen max=max(XY)
local max=max
drop max
local place=1
while `place'=0 & `=mark[`first']'==0 {
if `dist'<=`radius' & `dist'>0 & `=mark[`first']'==0 {
di " `max' `=id[`first']' `=id[`second']' `dist'"
local num=`num'+1
qui replace match`num'=`=id[`second']' in `first'
qui replace dist`num'=`dist' in `first'
}
}
}
local place=`place'+`=count[`place']'
}
end
distance, kernel(.7) rad(.1)
>
> To quote GH Hardy: "I am reluctant to intrude in a discussion concerning
> matters of which I have no expert knowledge,..."
>
> But, it seems to me that for each farm, you could create a quick screen
> that would remove the polygon points that were unlikely to be close to
> the farm.
>
> If f_lat and f_long are the coordinates of the farm and the coordinates
> for each polygon point are p_lat and p_long, then
>
> gen fp_lat = f_lat - p_lat
> gen fp_long = f_long - p_long
> gen dist = sqrt(fp_lat^2 + fp_long^2)
> qui sum dist, detail
> keep if dist
> If all your farms and lakes are in the UK, you would need to add a
> constant to the longitudes before calculating the differences (modulo
> 180 or 360; I don't know how longitude is specified in the data).
>
_________________________________________________________________
Hotmail: Powerful Free email with security by Microsoft.
http://clk.atdmt.com/GBL/go/171222986/direct/01/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/