Daniel Macneil <[email protected]> writes,
> I am stumped on what i thought was an easy problem. I have children
> (id_code) within families(household_code) who go to school (1) or not(0).
> How do I find the mean values (or even table) at the household level for how
> many children are in school? I know I can use xtsum to get the std dev
> within and between, but about just finding out: the number of families with
> ALL children in school, SOME children in school and NO children in school,
> e.g.
>
>
> +--------------------------------------------+
> | househ~e id_code sex age school |
> |--------------------------------------------|
> 4. | 1 4 Female 6 0 |
> 9. | 2 3 Female 12 0 |
> 10. | 2 4 Male 8 1 |
> 14. | 3 3 Female 12 0 |
> 15. | 3 4 Male 10 1 |
> |--------------------------------------------|
> 16. | 3 5 Male 14 1 |
> 23. | 4 3 Female 14 0 |
> 24. | 4 4 Female 12 1 |
> 25. | 4 5 Female 6 0 |
> 32. | 5 5 Female 13 . |
> |--------------------------------------------|
First, let's get the number of children in school in the last observation
of each household. In other observations, the new variable will be missing.
We will have new variable n:
+-------------------------------------------------+
| househ~e id_code sex age school n |
|-------------------------------------------------|
4. | 1 4 Female 6 0 . |
9. | 2 3 Female 12 0 . |
10. | 2 4 Male 8 1 1 |
14. | 3 3 Female 12 0 . |
15. | 3 4 Male 10 1 . |
|-------------------------------------------------|
16. | 3 5 Male 14 1 2 |
23. | 4 3 Female 14 0 . |
24. | 4 4 Female 12 1 . |
25. | 4 5 Female 6 0 1 |
|-------------------------------------------------|
. sort household
. by household: gen n = cond(_n==_N, sum(school), .)
That may be too tricky, so here's another way following the same logic:
. sort household
. by household: gen n = sum(school)
. by household: replace n = . if _n<_N
Now we can obtain the average number of students households have in school:
. summarize n
Now let's get the number and fraction of familes with ALL children in
school, SOME but not ALL in school, and NO children in school:
. by household: gen all = (n==_N) if _n==_N
. by household: gen some = (n<_N & n>0) if _n==_N
. by household: gen none = (n==0) if _n==_N
With that, we can get the fractions via,
. summarize all some none
or we can get counts via
. tabulate all
. tabulate some
. tabulate none
In all of the above, I went to extra work to ensure that the variables
were defined in only the last observation of each household. That ensured
the means and counts were properly weighted to represent households.
Now let's assume that Daniel wants the variables defined for every
observation. Perhaps he wants to fit a regression when the
exlanatory variables are fraction in school, or dummies for all, some, or
none.
First, I'll get the fraction:
. gen f = n/_N
. by household: replacxe f = f[_N]
Now I'll just fill the dummies:
. by household: replace all = all[_N]
. by household: replace some = some[_N]
. by household: replace none = none[_N]
-- Bill
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/