Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Michael N. Mitchell" <Michael.Norman.Mitchell@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: merging datasets |
Date | Tue, 09 Nov 2010 14:55:05 -0800 |
Dear MikeI don't know if this is helpful, but when I have encountered this kind of data, this is the strategy that I have used. Call the two databases A and B. I would start by matching A and B based on all of the variables (first, middle, last, dob, ssn). Some observations will match on all criteria. Call those matches observations that met matched based on criteria one. Take the remaining unmatched observations and then try and match them on a looser criteria, for example everything but middle name. Call those matches criteria two matches. Take the unmatched observations and try matching again on a looser criteria. Repeat this process continuing to loosen up the matching criteria. At the end, I might be matching based on a criteria that is too loose for my comfort (such as, last name only). You can then do a frequency count, among the matching records, of how many matched at each criteria level (including the criteria that is too loose for comfort). You can then weigh the number of matches against the criteria to decide the optimal balance between matches and quality of the match criteria.
I hope this helps, Michael N. Mitchell Data Management Using Stata - http://www.stata.com/bookstore/dmus.html A Visual Guide to Stata Graphics - http://www.stata.com/bookstore/vgsg.html Stata tidbit of the week - http://www.MichaelNormanMitchell.com On 2010-11-08 8.50 PM, Michael Eisenberg wrote:
Colleagues, I have a database of about 20K men that I'd like to merge with another database. I have names (first, middle, and last) as well as date of birth and social security number for most men. Unfortunately, the original database has some missing data on birthdate and social security numbers. The new database has most of the birthdate info as well as the geographic information that I need. Some men do have the same name. Is there anyway to merge based on name if it doesn't uniquely identify men? I'd like to somehow match all men and then let me manually compare based on visit dates to decide if it's likely the match is correct. If not, any suggestions? Thanks for you help. Mike * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/
* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/