Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Merge issues - m:m not returning all matches
From
Scott Merryman <[email protected]>
To
[email protected]
Subject
Re: st: Merge issues - m:m not returning all matches
Date
Fri, 20 Jan 2012 09:41:32 -0600
For example:
clear*
set obs 2
gen id = 2500
gen patiennum = 10
gen date = _n
save id,replace
clear
set obs 16
gen id = 2500
gen dist = runiform()
save tract,replace
use id
merge m:m id using tract
count
use id,clear
joinby id using tract
count
Scott
On Fri, Jan 20, 2012 at 9:24 AM, Aaron Legler <[email protected]> wrote:
> I am having an issue with merge -
>
> I have one dataset with patient_id and censustract, and another file with
> censustract and distance to 16 locations
>
> When I perform the merge I am not getting all the possible matches:
>
> This is the original patient with 2 records
>
> patiennum geoid svc_date
> 12345 25009205500 01 Aug 09
> 12345 25009205500 05 Sep 10
>
> after the merge: merge m:m geoid using chc.censustract.dist.dta
>
> I should get 32 records (2 patient records x 16 locatons) but I'm only
> getting 16:
>
> patien~m geoid svc_date km_to_~c hosp _merge
> 12345 25009205500 01 Aug 09 13.701 2 matched (3)
> 12345 25009205500 05 Sep 10 15.144 1 matched (3)
> 12345 25009205500 05 Sep 10 15.144 5 matched (3)
> 12345 25009205500 05 Sep 10 15.144 13 matched (3)
> 12345 25009205500 05 Sep 10 15.144 14 matched (3)
> 12345 25009205500 05 Sep 10 19.156 12 matched (3)
> 12345 25009205500 05 Sep 10 19.156 16 matched (3)
> 12345 25009205500 05 Sep 10 20.407 3 matched (3)
> 12345 25009205500 05 Sep 10 20.407 4 matched (3)
> 12345 25009205500 05 Sep 10 20.407 6 matched (3)
> 12345 25009205500 05 Sep 10 20.407 8 matched (3)
> 12345 25009205500 05 Sep 10 20.407 11 matched (3)
> 12345 25009205500 05 Sep 10 20.407 15 matched (3)
> 12345 25009205500 05 Sep 10 25.031 9 matched (3)
> 12345 25009205500 05 Sep 10 25.038 7 matched (3)
> 12345 25009205500 05 Sep 10 25.583 10 matched (3)
>
> It seems like the system isn't recognizing the differences in svc_date and
> just running 1 match.
>
> I checked to ensure the geoids are the same:
>
> . tab geoid
> geoid | Freq. Percent Cum.
> ------------+-----------------------------------
> 2.50e+10 | 16 100.00 100.00
> ------------+-----------------------------------
> Total | 16 100.00
> Any suggestions would be very appreciated. thanks.
>
> Aaron Legler
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/