Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: generating parent variable in child level data
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
RE: st: generating parent variable in child level data
Date
Mon, 8 Nov 2010 12:15:51 +0000
This crossed with Eric Booth's solution, which in a way is about half-way between mine and Mitch Abdon's.
In my solution, the age for the mother, for example, is recorded for the mother herself and also for the father. Values in such observations can be ignored if of no interest or set to missing afterwards.
For more on -cond()- if desired, see
SJ-5-3 pr0016 . . Depending on conditions: a tutorial on the cond() function
. . . . . . . . . . . . . . . . . . . . . . . D. Kantor and N. J. Cox
Q3/05 SJ 5(3):413--420 (no commands)
tutorial on the cond() function
Nick
[email protected]
Nick Cox
Here is another way to do it, without any loops:
egen age_mother = mean( cond(relation == "mother", age, .) ), by(hhid)
The way this works, reading inside out:
1. The expression
cond(relation == "mother", age, .)
yields the -age- of the mother when the person is the mother and missing otherwise.
2. The -egen- function -mean()- takes the mean of that expression, -by(hhid)-. As you would hope and expect, it ignores missings, except if all the values are missing.
Now the implication, or perhaps inference, is that there should be at most one mother in each household. If that's true, then other -egen- functions will yield the same result, such as -min()- and -max()-.
Conversely, you should check that it is true:
egen n_mothers = total(relation == "mother"), by(hhid)
If it's not true, then presumably you need to work out what you want for two or more mothers. (If there's no mother, the result is missing, as above.)
By the way, a -by()- option for -egen- is supported, just no longer documented.
For other problems in this territory, see
FAQ . . Creating variables recording prop. of the other members of a group
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
4/05 How do I create variables summarizing for each
individual properties of the other members of a
group?
http://www.stata.com/support/faqs/data/members.html
FAQ . . Creating variables recording whether any or all possess some char.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
2/03 How do I create a variable recording whether any
members of a group (or all members of a group)
possess some characteristic?
http://www.stata.com/support/faqs/data/anyall.html
Nick
[email protected]
Mitch Abdon
===========
Here is one way of doing this:
gen age_mother=.
gen age_father=.
gen educ_mother=.
gen educ_father=.
levelsof hhid, local(levels)
foreach i of local levels{
qui: sum age if relation=="mother" & hhid==`i'
replace age_mother=r(mean) if hhid==`i'
qui: sum age if relation=="father" & hhid==`i'
replace age_father=r(mean) if hhid==`i'
qui: sum education if relation=="mother" & hhid==`i'
replace educ_mother=r(mean) if hhid==`i'
qui: sum education if relation=="father" & hhid==`i'
replace educ_father=r(mean) if hhid==`i'
}
if you don't need the lines for 'mother' and 'father' , you can just drop them
Shikha Sinha
============
> I have a household level data set in which each household has father,
> mother, and child level data as row, something as the following:
>
> hhid relation age education
> 1 father 40 8
> 1 mother 38 3
> 1 son 18 4
> 1 son 15 2
> 1 daughter 12 2
>
> I wish to generate parent level variable (age, education) for every
> children in the same household. Please suggest.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/