On September 28th, I posted something quite similar entitled "Collapse & Missing Values". A few people chimed in with their thoughts & ideas. Basically, there is not an easy way around this problem. But here is one that I think works:
. l
+-----------------+
| family income |
|-----------------|
1. | 1 . |
2. | 1 . |
3. | 1 . |
4. | 2 75000 |
5. | 2 87000 |
|-----------------|
6. | 2 . |
7. | 3 . |
8. | 3 . |
+-----------------+
. egen fam_inc = total(income), by(family)
. egen no_miss = total(cond(income==.,1,0)), by(family)
. egen no_nmiss = total(cond(income~=.,1,0)), by(family)
. l
+------------------------------------------------+
| family income fam_inc no_miss no_nmiss |
|------------------------------------------------|
1. | 1 . 0 3 0 |
2. | 1 . 0 3 0 |
3. | 1 . 0 3 0 |
4. | 2 75000 162000 1 2 |
5. | 2 87000 162000 1 2 |
|------------------------------------------------|
6. | 2 . 162000 1 2 |
7. | 3 . 0 2 0 |
8. | 3 . 0 2 0 |
+------------------------------------------------+
. replace fam_inc = . if fam_inc == 0 & no_miss > 0 & no_nmiss == 0
(5 real changes made, 5 to missing)
. l
+------------------------------------------------+
| family income fam_inc no_miss no_nmiss |
|------------------------------------------------|
1. | 1 . . 3 0 |
2. | 1 . . 3 0 |
3. | 1 . . 3 0 |
4. | 2 75000 162000 1 2 |
5. | 2 87000 162000 1 2 |
|------------------------------------------------|
6. | 2 . 162000 1 2 |
7. | 3 . . 2 0 |
8. | 3 . . 2 0 |
+------------------------------------------------+
Kind of kludgy, I know. What I'd really like to see Stata at least offer an option on collapse & egen that would not do this, but Nick Cox rather dashed my hopes on that front. But perhaps someone can write a routine that would automate this?
Best of luck,
Eric
>Hi, all.
>
>I'm using US 2000 Census data (IPUMS version, with my edits). I've hit upon an issue I don't find much <help> on: how to preserve missing values when these are qualitatively different from zero values when using an <egen> function.
>
>I have individual-level income income data (inctot2) that I want to aggregate within families (famunt2) in a household (serial):
>
>egen ftoty=sum(inctot2), by(serial famunt2)
>
>The issue: ftoty is zero, even when all family members have inctot2==. (i.e., not reported, for example, due to age). In my application (determining family income relative to a poverty threshold) zero family income is very different from nonreported family income.
>
>One work-around is to use the !missing(varname) construction, which sets ftoty to missing for any person with missing inctot2:
>
> egen ftoty=sum(inctot2) if !missing(inctot2), by(serial famunt2)
>
>The drawback to this approach is that I must go back and assign non-missing values of ftoty to individuals for whom ftoty is missing, but who live in a family where other individuals report a valid income value.
>
>Is there a better way to approach this problem?
>
>Best, Deborah Garvey
>
>******************************
>Deborah Garvey, Ph.D.
>Department of Economics
>Kenna Hall
>Santa Clara University
>Santa Clara, CA 95053
>408/554-5580
>408/554-2331 (FAX)
>[email protected]
>http://lsb.scu.edu/~dgarvey
>**********************************
>
>
>This message scanned for viruses and SPAM at SCU (MGW2)
>
>*
>* For searches and help try:
>* http://www.stata.com/support/faqs/res/findit.html
>* http://www.stata.com/support/statalist/faq
>* http://www.ats.ucla.edu/stat/stata/
--
===================================================
Eric G. Wruck
Econalytics
2535 Sherwood Road
Columbus, OH 43209
ph: 614.231.5034
cell: 614.330.8846
eFax: 614.573.6639
eMail: [email protected]
website: http://www.econalytics.com
====================================================
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/