Hi all,
I have a dataset containing birth records in a particular region over
10 years. Many of the records are siblings or half-siblings. I need to
match siblings and half-siblings using a number of identifiers. I have
a unique maternal ID number for some of the sample; for the remainder
of the sample I'll have to match on name, geocode, blood type, etc.
Having never done this before, I'm not sure if I'm on the right track
or using the most accurate/efficient approach.
Here is the command I'm using to match on the maternal id:
duplicates tag maternal_id, gen(id_match)
gen parity = id_match if id_match<=10
For other match variables, this is the commands I'm using:
egen dobmaidfirst = group( dateofbirth maidenname firstname)
duplicates tag dobmaidfirst, gen(dobmaidfirstmatch)
replace parity = dobmaidfirstmatch if dobmaidfirstmatch<=4 &
[parity==0|parity==.]
Does anyone know if this is a common approach to matching individuals
existing within the same dataset. I know there are also probabilistic
methods, which I may try later. But first I need to use this approach.
Thank you,
Nathan
--
Doctoral Student • Columbia University School of Social Work
Fellow • Columbia Population Research Center
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/