Friedrich Huebler <[email protected]> has the following data:
hh id sex age mid fid
---------------------------------
1 1 1 30 . .
1 2 2 30 . .
1 3 1 5 2 1
2 1 1 30 . .
2 3 2 5 2 1
In this data, mid id the id-within-hh of the mother.
Friedrich wants to add new variable mage, the age of the mother, so that
we would have
hh id sex age mid fid mage
----------------------------------------
1 1 1 30 . . .
1 2 2 30 . . .
1 3 1 5 2 1 30
2 1 1 30 . . .
2 3 2 5 2 1 . <- mage=. because id=2
does not exist
The solution is to create a dataset containing hh, id, and age, and
then to merge to obtain mother's age.
Call the original data master.dta.
First, I verify something Friedrich has implied:
. use master, clear
. assert hh!=.
. assert pid!=.
. sort hh pid
. by hh pid: assert _n==1
That out of the way, I create the data that I will use to merge.
In this dataset, everybody is a potential mother:
. use master, clear
. keep hh id age
. rename id mid
. rename age mage
. sort hh mid
. save tomerge, replace
Now I merge the original dataset with the data I just created. I merge on hh
and mid, and I keep only the original observations. Note that this is a
potentially a many-one merge, because more than one child in a household
can have the same mother. There's nothing special I need to do, however.
. use master, clear
. sort hh mid
. merge hh mid using tomerge, nokeep
. drop _merge
. sort hh id // put data back in original order
I'm done, except to clean up,
. erase tomerge.dta
-- Bill
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/