Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: Calculate Distance between Properties within Portfolios
From
"S. McKay Price" <[email protected]>
To
[email protected]
Subject
Re: st: RE: Calculate Distance between Properties within Portfolios
Date
Mon, 16 Sep 2013 11:14:35 -0400
Thank you Joe and Robert for your insight and the elegant coding
example. Both approaches calculated the distances between properties.
As expected, the brute force approach (using -joinby- ) created millions
of observations and took about five minutes to run (my machine has 16GB
of RAM).
The -geonear- approach using the loop generated the distances too,
although I was unable to get it to keep the unique portfolio identifiers
(for later merging and analysis) when specifying the -long- option.
That is to say, when I slightly altered Robert's code to include "long
within(24000)" as follows:
* --------------- begin example ----------------------------
* using -geonear- from SSC
use "`main'", clear
sum pfolio, meanonly
local npid = r(max)
tempfile nbors
qui forvalues i = 1/`npid' {
use if pfolio == `i' using "`main'", clear
save "`nbors'", replace
geonear propid lat lon using "`nbors'", ///
n(propid lat lon) ignore long within(24000)
tempfile res`i'
save "`res`i''"
}
clear
forvalues i = 1/`npid' {
append using "`res`i''"
}
* --------------- end example -----------------------------
Thanks again! Your suggestions were most helpful and accomplished what
I needed.
McKay
On 9/11/2013 1:29 AM, Robert Picard wrote:
As Joe said, -joinby- is the tool to go if you are going to
do this using a brute force approach. You can also get the
big guns and use -geonear- (from SSC). You will have to do
each portfolio separately but it's still going to be faster
than the brute force approach.
* --------------- begin example ---------------------------
set seed 1234
clear
set obs 20
gen porfolio_id = 1000 + _n
egen pfolio = group(porfolio_id)
expand runiform() * 360 + 2
sort pfolio
by pfolio: gen propid = _n
sort pfolio propid
gen double lat = runiform()
gen double lon = runiform()
tempfile main
save "`main'"
* brute force approach
rename (propid lat lon) (propid0 lat0 lon0)
joinby pfolio using "`main'"
drop if propid == propid0
isid pfolio propid propid0, sort
geodist lat lon lat0 lon0, gen(km_brute) sphere
sort pfolio propid km_brute propid0
by pfolio propid: keep if _n == 1
tempfile brute
save "`brute'"
* using -geonear- from SSC
use "`main'", clear
sum pfolio, meanonly
local npid = r(max)
tempfile nbors
qui forvalues i = 1/`npid' {
use if pfolio == `i' using "`main'", clear
save "`nbors'", replace
geonear propid lat lon using "`nbors'", ///
n(propid lat lon) ignore
tempfile res`i'
save "`res`i''"
}
clear
forvalues i = 1/`npid' {
append using "`res`i''"
}
merge 1:1 pfolio propid using "`brute'", nogen
assert nid == propid0
assert abs(km_brute - km_to_nid) < 1e-12
* --------------- end example -----------------------------
On Tue, Sep 10, 2013 at 9:49 PM, Joe Canner <[email protected]> wrote:
McKay,
Take a look at -joinby-. You will probably have to create a duplicate copy of your dataset and rename the property_id, lat, and lon variables in the duplicated data set. Then do:
. use original.dta
. joinby portfolio_id using duplicate.dta
. geodist lat lon duplat duplon
(Warning: this will create about 6.5 million records.)
Regards,
Joe
________________________________________
From: [email protected] [[email protected]] on behalf of S. McKay Price [[email protected]]
Sent: Tuesday, September 10, 2013 6:28 PM
To: [email protected]
Subject: st: Calculate Distance between Properties within Portfolios
Hello,
I'm trying to calculate the distance, in miles or kilometers, between
all possible pairwise combinations of properties within a given
portfolio. Is there an efficient way to structure the data to
accomplish this?
My data include numerous portfolios (roughly 200), each with a unique
portfolio identifier (portfolio_id). And, there are multiple properties
within each portfolio (180 on average), where each property has a unique
property identifier (property_id). I have latitude and longitude
coordinates in decimal form for each property (e.g. 42.270873
-83.726329) for use in a command such as -geodist- from SSC, or
something similar. The data are organized as follows:
portfolio_id property_id latitude longitude
1 1 lat lon
1 2 lat lon
1 3 lat lon
...
2 1 lat lon
2 2 lat lon
2 3 lat lon
etc...
Any suggestions? Thank you for your consideration.
McKay
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/