When working with twin household/individual datasets, this is one of the most
useful FAQs:
http://www.stata.com/support/faqs/data/members.html
However, there are a few issues I couldn't solve with the information included
there, or not efficiently at least. I would like to solve my problem and, if
worthwhile, write an extension of the FAQ. The problem refers to the fact that
sometimes you may need to record for one individual the properties not of the
whole group, but of another member of the group in particular.
In my example, I have a household survey where I don't have direct information
about the number of kids of each individual, but I have something like this:
hhid and member are just the household id and number of member. Variables
fatherm and motherm tell you the number of the member of the father and the
mother, if in the household:
hhid member fatherm motherm
1 1 - -
1 2 - -
1 3 1 2
1 4 1 2
1 5 1 2
2 1 - -
2 2 - 1
2 3 - 2
...
Family one is a couple with three kids. Family two is a grandma, the daughter,
and a grandchild.
I want to create the variable ownkids that gives me the number of own kids
living in the house:
hhid member ownkids
1 1 3
1 2 3
1 3 0
1 4 0
1 5 0
2 1 1
2 2 1
2 3 0
My force brute solution, which makes a lot of unnecessary comparisons and takes
very long (because I generate and drop many variables) is of the form: with
maxmem being the number of members of each household (group i, max is the number
of groups),
forvalues i = 1/`max' {
qui sum member if group==`i'
local maxmem=r(max) forvalues j = 1/`maxmem' {
di "-----------Household number `i', number of members: `maxmem'"
forvalues k = 1/`maxmem' {
di "Household `i', member `j', comparing with `k'"
qui gen a=motherm==`j' if member==`k'&group==`i'
qui egen b=max(a)
qui replace mkids=mkids+b if member==`j'&group==`i'
drop a b
qui gen a=fatherm==`j' if member==`k'&group==`i'
qui egen b=max(a)
qui replace fkids=fkids+b if member==`j'&group==`i'
drop a b
}
}
}
This creates two variables, mkids and fkids, which are the number of kids for
mothers and fathers. For each member of the household, I compare if . The egen,
replace, drop, takes very long, and even longer if the dataset in memory is
large (I had to partition the dataset in 25 parts to make this run faster).
The main problem (the main awkwardness in this program) is that I gen, egen,
etc. because I could not just create a scalar that reflects the value of a
variable for one precise observation, something of the form (which of course
doesn't work):
local a=mother==`j' if member==`k'&group==`i' (meaning: mother etc. should
refer to the observation: member==`k'&group==`i')
I coudn't use something like motherm[_...] becauseI was not using by: ... .
What I would like to know if there are more efficient ways of doing this (I'm
sure there are!).
Thank you all
***************************
Guillermo Cruces
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/