Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <n.j.cox@durham.ac.uk> |
To | "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |
Subject | st: RE: Merge issues - m:m not returning all matches |
Date | Fri, 20 Jan 2012 15:40:14 +0000 |
Also, your problem sounds more like one for -joinby-. Nick n.j.cox@durham.ac.uk -----Original Message----- From: Nick Cox Sent: 20 January 2012 15:36 To: 'statalist@hsphsun2.harvard.edu' Subject: RE: Merge issues - m:m not returning all matches On m:m merges: see the thread last week starting with http://www.stata.com/statalist/archive/2012-01/msg00370.html However, please ignore my post in that thread: it missed the point, which is well explained by others. Nick n.j.cox@durham.ac.uk -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Aaron Legler Sent: 20 January 2012 15:25 To: statalist@hsphsun2.harvard.edu Subject: st: Merge issues - m:m not returning all matches I am having an issue with merge - I have one dataset with patient_id and censustract, and another file with censustract and distance to 16 locations When I perform the merge I am not getting all the possible matches: This is the original patient with 2 records patiennum geoid svc_date 12345 25009205500 01 Aug 09 12345 25009205500 05 Sep 10 after the merge: merge m:m geoid using chc.censustract.dist.dta I should get 32 records (2 patient records x 16 locatons) but I'm only getting 16: patien~m geoid svc_date km_to_~c hosp _merge 12345 25009205500 01 Aug 09 13.701 2 matched (3) 12345 25009205500 05 Sep 10 15.144 1 matched (3) 12345 25009205500 05 Sep 10 15.144 5 matched (3) 12345 25009205500 05 Sep 10 15.144 13 matched (3) 12345 25009205500 05 Sep 10 15.144 14 matched (3) 12345 25009205500 05 Sep 10 19.156 12 matched (3) 12345 25009205500 05 Sep 10 19.156 16 matched (3) 12345 25009205500 05 Sep 10 20.407 3 matched (3) 12345 25009205500 05 Sep 10 20.407 4 matched (3) 12345 25009205500 05 Sep 10 20.407 6 matched (3) 12345 25009205500 05 Sep 10 20.407 8 matched (3) 12345 25009205500 05 Sep 10 20.407 11 matched (3) 12345 25009205500 05 Sep 10 20.407 15 matched (3) 12345 25009205500 05 Sep 10 25.031 9 matched (3) 12345 25009205500 05 Sep 10 25.038 7 matched (3) 12345 25009205500 05 Sep 10 25.583 10 matched (3) It seems like the system isn't recognizing the differences in svc_date and just running 1 match. I checked to ensure the geoids are the same: . tab geoid geoid | Freq. Percent Cum. ------------+----------------------------------- 2.50e+10 | 16 100.00 100.00 ------------+----------------------------------- Total | 16 100.00 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/