Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: Merge issues - m:m not returning all matches
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
st: RE: Merge issues - m:m not returning all matches
Date
Fri, 20 Jan 2012 15:35:40 +0000
On m:m merges: see the thread last week starting with
http://www.stata.com/statalist/archive/2012-01/msg00370.html
However, please ignore my post in that thread: it missed the point, which is well explained by others.
Nick
[email protected]
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Aaron Legler
Sent: 20 January 2012 15:25
To: [email protected]
Subject: st: Merge issues - m:m not returning all matches
I am having an issue with merge -
I have one dataset with patient_id and censustract, and another file with
censustract and distance to 16 locations
When I perform the merge I am not getting all the possible matches:
This is the original patient with 2 records
patiennum geoid svc_date
12345 25009205500 01 Aug 09
12345 25009205500 05 Sep 10
after the merge: merge m:m geoid using chc.censustract.dist.dta
I should get 32 records (2 patient records x 16 locatons) but I'm only
getting 16:
patien~m geoid svc_date km_to_~c hosp _merge
12345 25009205500 01 Aug 09 13.701 2 matched (3)
12345 25009205500 05 Sep 10 15.144 1 matched (3)
12345 25009205500 05 Sep 10 15.144 5 matched (3)
12345 25009205500 05 Sep 10 15.144 13 matched (3)
12345 25009205500 05 Sep 10 15.144 14 matched (3)
12345 25009205500 05 Sep 10 19.156 12 matched (3)
12345 25009205500 05 Sep 10 19.156 16 matched (3)
12345 25009205500 05 Sep 10 20.407 3 matched (3)
12345 25009205500 05 Sep 10 20.407 4 matched (3)
12345 25009205500 05 Sep 10 20.407 6 matched (3)
12345 25009205500 05 Sep 10 20.407 8 matched (3)
12345 25009205500 05 Sep 10 20.407 11 matched (3)
12345 25009205500 05 Sep 10 20.407 15 matched (3)
12345 25009205500 05 Sep 10 25.031 9 matched (3)
12345 25009205500 05 Sep 10 25.038 7 matched (3)
12345 25009205500 05 Sep 10 25.583 10 matched (3)
It seems like the system isn't recognizing the differences in svc_date and
just running 1 match.
I checked to ensure the geoids are the same:
. tab geoid
geoid | Freq. Percent Cum.
------------+-----------------------------------
2.50e+10 | 16 100.00 100.00
------------+-----------------------------------
Total | 16 100.00
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/