OK. Consider this toy dataset:
. l
+------------------+
| response group |
|------------------|
1. | 1 1 |
2. | 2 1 |
3. | 3 2 |
4. | . 2 |
5. | 0 3 |
|------------------|
6. | . 3 |
7. | . 4 |
8. | . 4 |
+------------------+
I'd regard 0 as a sensible sum for group
3 but not for group 4, but as you say
. collapse (sum) response, by(group)
. l
+------------------+
| group response |
|------------------|
1. | 1 3 |
2. | 2 3 |
3. | 3 0 |
4. | 4 0 |
+------------------+
One way round this is to tag groups
for which _all_ values are missing.
One way to do that is
. bysort group (response) : gen allmissing = mi(response[1])
That is, within groups, sort on response. If the first
value within each group is missing, then all of them
in that group must be.
The principle here is close to one explained at
http://www.stata.com/support/faqs/data/diff.html
You might also like to look at
http://www.stata.com/support/faqs/data/anyall.html
. l
+-----------------------------+
| response group allmis~g |
|-----------------------------|
1. | 1 1 0 |
2. | 2 1 0 |
3. | 3 2 0 |
4. | . 2 0 |
5. | 0 3 0 |
|-----------------------------|
6. | . 3 0 |
7. | . 4 1 |
8. | . 4 1 |
+-----------------------------+
Now we make sure the -collapse- carries along
the important information:
. collapse (sum) response (min) allmissing, by(group)
. l
+-----------------------------+
| group response allmis~g |
|-----------------------------|
1. | 1 3 0 |
2. | 2 3 0 |
3. | 3 0 0 |
4. | 4 0 1 |
+-----------------------------+
and convert back:
. replace response = . if allmissing
(1 real change made, 1 to missing)
. l
+-----------------------------+
| group response allmis~g |
|-----------------------------|
1. | 1 3 0 |
2. | 2 3 0 |
3. | 3 0 0 |
4. | 4 . 1 |
+-----------------------------+
Nick
[email protected]
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]On Behalf Of Monica L.
> Parra Torrado
> Sent: 28 July 2004 01:24
> To: [email protected]
> Subject: st: RE: RE: collapse missing values into zeroes when using
> (sum)
>
>
> I would like to keep them as missing because I need to be able to
> distinguish between actual 0s and missing values.
> Thanks,
> Monica L.
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Nick Cox
> Sent: Tuesday, July 27, 2004 7:51 PM
> To: [email protected]
> Subject: st: RE: collapse missing values into zeroes when using (sum)
>
>
> You say that you don't want missings to be
> included in the -collapse-, as they are
> regarded as summing to 0.
>
> What do you want Stata to do instead?
>
> Nick
> [email protected]
>
> Monica L. Parra Torrado
>
> > I have information at cities and state level but want to
> > aggregate it at the
> > state level using weighted averages (in a very manual way).
> > For one of the
> > steps I am using the command "collapse (sum) variables, by
> > (state year)" but
> > then I am getting zeroes where there were missing values
> > because of the sum
> > function (it does not happen with the other collapse
> > functions like sd,
> > mean, etc). I tried to use the option "cw" but since the
> > missing values are
> > not for the same observation among the different variables I
> > end up having
> > no observations at all.
> > So I would like to know if there is a way I could tell Stata
> > not to collapse
> > missing values into zeroes when using (sum).
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/