Asad's sample data will help fix ideas:
hhold id s_id age s_age
23 2 1 30 35
23 1 2 35 30
23 4 . 65 .
23 3 . 3 .
45 2 1 50 40
45 1 2 40 50
45 6 2 30 50
45 8 . 5 .
45 5 . 5 .
45 4 . 8 .
45 7 . 2 .
45 3 . 12 .
Here is a revised stab at the problem:
1. Count how many times each spouse identifier
occurs within each household:
. bysort hhold s_id : gen ns = _N * (s_id < .)
2. If any spouse identifier occurs more than
once, the individuals must be wives with
the same husband. The person they have
in common, their husband, is the appropriate
identifier for a group of husband and wives.
(No sexism here; it's only the choice to
make the problem tractable.)
. gen g_id = s_id if ns > 1
3. At this point we are confident of the
status of those groups of wives and also
of unmarried individuals. We will label
those OK. (That is, OK = 1 for these
and OK = 0 for others.)
. gen OK = g_id < . | s_id == .
4. We are left with
a. husbands with two or more wives
b. husbands and wives who are just
married to each other (monogamous
couples).
We'll take them one at a time.
5. In the case of a. the appropriate
group identifier is that of the individual
concerned (a husband, so his id is already
in use as a group identifier for his wives,
from step 2).
. replace g_id = id if g_id == . & s_id < .
6. That was the right thing to do in the
case of group identifiers previously
created. They have been tagged as -OK-.
More generally, the right value of
-OK- is the largest so far assigned
for each group identifier.
. bysort hhold g_id (OK) : replace OK = OK[_N]
7. The only individuals now to be assigned
are monogamous couples. These are tagged
by -OK- of 0. A systematic
way to identify them is by assigning
the minimum of the two ids. That
is, if person 1 has spouse 2 and
person 2 has spouse 1, then we
can give them both group
identifiers of min(1,2) i.e. 1.
It doesn't matter whether the
group identifier is that of the
husband or the wife as none of
the individuals has any other spouse.
. replace g_id = min(id,s_id) if OK == 0
hhold id s_id ns g_id
23 1 2 1 1
23 2 1 1 1
23 3 . . .
23 4 . . .
45 2 1 1 2
45 1 2 2 2
45 6 2 2 2
45 5 . . .
45 8 . . .
45 4 . . .
45 7 . . .
45 3 . . .
So, we have now identified groups within households,
of one husband and one or more wives.
Although the example contains no households with
more than one husband, the method should apply
to those as well.
To get mean ages
bysort hhold g_id ns : egen meanage = mean(age) if ns
by hhold g_id : gen S_age = meanage[2] if _n == 1
by hhold g_id : replace S_age = meanage[1] if S_age == .
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/