Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: identifying duplicate records
From
raoul reulen <[email protected]>
To
[email protected]
Subject
st: identifying duplicate records
Date
Fri, 10 Feb 2012 14:08:04 +0000
Hello
Just wondering if I could get some advice. I have a large database
with around 300,000 records of individuals. There can be more than one
record per individual. Now, how do I identify individuals? I assume
that it is the same indivual if:
Date of birth and NHS number are the same OR
date of birth and surname are the same OR
surname and NHS number are the same.
So there are various combinations possible. A date of birth could have
typos in it; but if the NHS number and the surname are the same then I
assume it is the same person. The NHS number can have typos, but if
the date of birth and the surname are the same I will assume it is the
same person.
What is the best way to approach this? I want to end up with an
id-number that identifies the individual. Many thanks for your help.
Raoul
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/