Austin--
Okay, but what are the match variables for this supposed
merge? Kelly indicated that there is no identifier linking person to
hospital. (And the point of this task is to, effectively, create a
link that didn't exist before.) I still think that the structure
suggests a -cross-, which is a like a -merge- but joins every person
with every hospital; maybe that's what you had in mind anyway. My
suggested code was to avoid an absurdly long dataset. But then
again, maybe it's not too absurd, if the set of variables is
small. You would get 5000000 observations -- not impossible with a
small set of variables. It's a big dataset, but the code is simpler:
use personfile
cross using hospitalfile
-- compute dist --
sort person dist
by person: keep if _n<=5 // or whatever small number you want
Again, I've left out the details of "compute dist". Maybe that's
where you envision a set of nested loops. Maybe you need loops if
the distance computation is complex. But I envisioned some -gen-
formulation; in that case, no looping is needed.
By the way, pardon my ignorance, but I haven't yet figured out what YMMV means.
--David
P.S., you might be able to cut down the joining operation if it can
be partitioned -- say by state:
use personfile
joinby state using hospitalfile
etc...
It will substantially reduce the resulting joined dataset, but that
will eliminate combinations where a person lives near a boundary and
the hospital is on the other side of the boundary -- a fairly common
situation. But maybe there is some other attribute that can be used,
though I can't think of any.
And another matter: Kelly wanted to find the nearest VA hospital and
the nearest non-VA hospital. That will take some more work. Perhaps:
use personfile
cross using hospitalfile
-- compute dist --
sort person va dist
by person va: keep if _n==1
That retains two records; one for va, on for non-va. You can then
-reshape wide- if you want it in one-record-per-person form.
HTH
--David
At 10:43 PM 1/3/2007, you wrote:
>David--
>As may be inferred from my post (by someone with superhuman insight),
>I think it is much easier to -merge- and then compute within nested
>loops, one across all i persons, and one across all j hospitals, as
>Nick does in -nearest- but as always, YMMV.
>
>On 1/3/07, David Kantor <[email protected]> wrote:
>>In response to Kelly Richardson's question about the distance between
>>home and hospital:
>>
>>The structure of this situation suggests a -cross- operation on the
>>two datasets (persons and hospitals) -- at least in theory.
>>This would yield a _very_ long dataset. But this is impractical
>>You might want to loop through one person at a time, joining hospital
>>data; then select the nearest (or, say the nearest 5 hospitals); then
>>somehow output just these (maybe using -post-).
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/