Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Fwd: st: RE: Calculate Distance between Properties within Portfolios
From
Robert Picard <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Fwd: st: RE: Calculate Distance between Properties within Portfolios
Date
Mon, 16 Sep 2013 12:31:25 -0400
Indeed, the -long- option for -geonear- is different and
does not merge back with the original data because multiple
neighbors are allowed, which could lead to a "m:m" merge.
For futur reference, here's an revised version that shows
how to use -geonear- in long form mode for every value of a
variable.
Robert
* --------------- begin example ---------------------------
set seed 1234
clear
set obs 20
gen porfolio_id = 1000 + _n
expand runiform() * 360 + 2
sort porfolio_id
by porfolio_id: gen propid = _n
sort porfolio_id propid
gen double lat = runiform()
gen double lon = runiform()
tempfile main
save "`main'"
* brute force approach
rename (propid lat lon) (propid0 lat0 lon0)
joinby porfolio_id using "`main'"
drop if propid == propid0
isid porfolio_id propid propid0, sort
geodist lat lon lat0 lon0, gen(km_brute) sphere
sort porfolio_id propid km_brute propid0
by porfolio_id propid: keep if _n == 1
tempfile brute
save "`brute'"
* using -geonear- from SSC
use "`main'", clear
clonevar propid0 = propid
save "`main'", replace
levelsof porfolio_id, clean
local pidlist `r(levels)'
local npid 0
tempfile nbors
qui foreach pid of local pidlist {
use if porfolio_id == `pid' using "`main'", clear
save "`nbors'", replace
geonear propid lat lon using "`nbors'", ///
n(propid0 lat lon) ignore long
gen porfolio_id = `pid'
local ++npid
tempfile res`npid'
save "`res`npid''"
}
clear
forvalues i=1/`npid' {
append using "`res`i''"
}
rename propid0 propid0_gn
merge 1:1 porfolio_id propid using "`brute'", nogen
assert propid0_gn == propid0
assert abs(km_brute - km_to_propid0) < 1e-12
* --------------- end example -----------------------------
---------- Forwarded message ----------
From: S. McKay Price <[email protected]>
Date: Mon, Sep 16, 2013 at 11:14 AM
Subject: Re: st: RE: Calculate Distance between Properties within Portfolios
To: [email protected]
Thank you Joe and Robert for your insight and the elegant coding
example. Both approaches calculated the distances between properties.
As expected, the brute force approach (using -joinby- ) created
millions of observations and took about five minutes to run (my
machine has 16GB of RAM).
The -geonear- approach using the loop generated the distances too,
although I was unable to get it to keep the unique portfolio
identifiers (for later merging and analysis) when specifying the
-long- option. That is to say, when I slightly altered Robert's code
to include "long within(24000)" as follows:
* --------------- begin example ----------------------------
* using -geonear- from SSC
use "`main'", clear
sum pfolio, meanonly
local npid = r(max)
tempfile nbors
qui forvalues i = 1/`npid' {
use if pfolio == `i' using "`main'", clear
save "`nbors'", replace
geonear propid lat lon using "`nbors'", ///
n(propid lat lon) ignore long within(24000)
tempfile res`i'
save "`res`i''"
}
clear
forvalues i = 1/`npid' {
append using "`res`i''"
}
* --------------- end example -----------------------------
Thanks again! Your suggestions were most helpful and accomplished
what I needed.
McKay
On 9/11/2013 1:29 AM, Robert Picard wrote:
>
> As Joe said, -joinby- is the tool to go if you are going to
> do this using a brute force approach. You can also get the
> big guns and use -geonear- (from SSC). You will have to do
> each portfolio separately but it's still going to be faster
> than the brute force approach.
>
> * --------------- begin example ---------------------------
>
> set seed 1234
> clear
> set obs 20
> gen porfolio_id = 1000 + _n
> egen pfolio = group(porfolio_id)
> expand runiform() * 360 + 2
> sort pfolio
> by pfolio: gen propid = _n
> sort pfolio propid
> gen double lat = runiform()
> gen double lon = runiform()
> tempfile main
> save "`main'"
>
> * brute force approach
> rename (propid lat lon) (propid0 lat0 lon0)
> joinby pfolio using "`main'"
> drop if propid == propid0
> isid pfolio propid propid0, sort
> geodist lat lon lat0 lon0, gen(km_brute) sphere
> sort pfolio propid km_brute propid0
> by pfolio propid: keep if _n == 1
> tempfile brute
> save "`brute'"
>
> * using -geonear- from SSC
> use "`main'", clear
> sum pfolio, meanonly
> local npid = r(max)
> tempfile nbors
> qui forvalues i = 1/`npid' {
> use if pfolio == `i' using "`main'", clear
> save "`nbors'", replace
> geonear propid lat lon using "`nbors'", ///
> n(propid lat lon) ignore
> tempfile res`i'
> save "`res`i''"
> }
> clear
> forvalues i = 1/`npid' {
> append using "`res`i''"
> }
> merge 1:1 pfolio propid using "`brute'", nogen
> assert nid == propid0
> assert abs(km_brute - km_to_nid) < 1e-12
>
> * --------------- end example -----------------------------
>
>
> On Tue, Sep 10, 2013 at 9:49 PM, Joe Canner <[email protected]> wrote:
>>
>> McKay,
>>
>> Take a look at -joinby-. You will probably have to create a duplicate copy of your dataset and rename the property_id, lat, and lon variables in the duplicated data set. Then do:
>>
>> . use original.dta
>> . joinby portfolio_id using duplicate.dta
>> . geodist lat lon duplat duplon
>>
>> (Warning: this will create about 6.5 million records.)
>>
>> Regards,
>> Joe
>> ________________________________________
>> From: [email protected] [[email protected]] on behalf of S. McKay Price [[email protected]]
>> Sent: Tuesday, September 10, 2013 6:28 PM
>> To: [email protected]
>> Subject: st: Calculate Distance between Properties within Portfolios
>>
>> Hello,
>>
>> I'm trying to calculate the distance, in miles or kilometers, between
>> all possible pairwise combinations of properties within a given
>> portfolio. Is there an efficient way to structure the data to
>> accomplish this?
>>
>> My data include numerous portfolios (roughly 200), each with a unique
>> portfolio identifier (portfolio_id). And, there are multiple properties
>> within each portfolio (180 on average), where each property has a unique
>> property identifier (property_id). I have latitude and longitude
>> coordinates in decimal form for each property (e.g. 42.270873
>> -83.726329) for use in a command such as -geodist- from SSC, or
>> something similar. The data are organized as follows:
>>
>> portfolio_id property_id latitude longitude
>> 1 1 lat lon
>> 1 2 lat lon
>> 1 3 lat lon
>> ...
>> 2 1 lat lon
>> 2 2 lat lon
>> 2 3 lat lon
>> etc...
>>
>> Any suggestions? Thank you for your consideration.
>>
>> McKay
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/