Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Teresio Poggio <terlist@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Merging datasets with non-unique identifiers |
Date | Wed, 23 Nov 2011 11:14:30 +0100 |
Dear Mary Ann, the files have a hierarchical data structure (each individual in files 4 & 5 is linked to one only household, but there may be more individuals in these files belonging to the same households). If you are interested in doing your analysis at an individual level (children) you can merge household level variables with the individual level ones by using merge many-to-one, instead of merging one-to-one. You can use the household id as a key variable. See help merge for details. If you are interested in doing your analysis at an household level you first need to summarize individual level information at an household level using collapse (help collapse for details) and a function that is useful to your purpose (count if you need a *number of children* variable; max if you need a *age of the elder child* variable, etx). You can then merge one-to-one the resulting data file with the household level ones. hth Best regards Teresio On Wed, Nov 23, 2011 at 10:46 AM, Mary Ann Cruz Bautista <maryann.bautista@duke-nus.edu.sg> wrote: > Dear all, > > I need to merge 6 datasets (from a survey which did not provide a coding manual): > > File1 Household level > File2 Household level > File3 Household level > File4 Individual > File5 Individual > > I only managed to merge files 1-4 using Household ID. > > File5 contains data with household ID and child number (this section of the questionnaire was answered by an individual from a selected household, but the responses do not refer to the respondent but to the child being reported about). How can I merge these datasets? > > File5 has non-unique identifiers with several observations having the same household ID. > > +----------------------------------------------+ > | hhid childno q5210 q5211 q5212 q5213 | > |----------------------------------------------| > 1. | 1 1 . . . . | > 2. | 1 2 . . . . | > 3. | 1 3 . . . . | > 4. | 1 4 . . . . | > 5. | 1 5 . . . . | > |----------------------------------------------| > 6. | 1 6 . . . . | > 7. | 1 7 Yes No No No | > 8. | 1 8 Yes No No No | > 9. | 1 9 . . . . | > 10. | 2 1 . . . . | > |----------------------------------------------| > 11. | 2 2 . . . . | > 12. | 2 3 . . . . | > 13. | 2 4 . . . . | > 14. | 2 5 No . No No | > 15. | 2 6 . . . . | > |----------------------------------------------| > > Merging File5 with the other files prompted that Household ID is not a unique identifier. > > [variable hhid does not uniquely identify observations in the master data] > > I'm an inexperienced Stata user and I'd be glad to get some help from more experienced people here. Thank you for accommodating my question. > > Best, > Mary > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- ____________________________________________________ dr. Teresio Poggio LaboR - Dipartimento di Sociologia e ricerca sociale Università degli studi di Trento Via Verdi, 26 38100 Trento, Italy Tel +39 0461/881406 fax: +39 0461/881348 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/