Below my signature is the key part of the posting referred to.
You can tweak that approach by
bysort youthid group (duration) : gen allmissing = missing(duration[1])
and then -collapse- as before.
Nick
[email protected]
OK. Consider this toy dataset:
. l
+------------------+
| response group |
|------------------|
1. | 1 1 |
2. | 2 1 |
3. | 3 2 |
4. | . 2 |
5. | 0 3 |
|------------------|
6. | . 3 |
7. | . 4 |
8. | . 4 |
+------------------+
I'd regard 0 as a sensible sum for group
3 but not for group 4, but as you say
. collapse (sum) response, by(group)
. l
+------------------+
| group response |
|------------------|
1. | 1 3 |
2. | 2 3 |
3. | 3 0 |
4. | 4 0 |
+------------------+
One way round this is to tag groups
for which _all_ values are missing.
One way to do that is
. bysort group (response) : gen allmissing = mi(response[1])
That is, within groups, sort on response. If the first
value within each group is missing, then all of them
in that group must be.
The principle here is close to one explained at
http://www.stata.com/support/faqs/data/diff.html
You might also like to look at
http://www.stata.com/support/faqs/data/anyall.html
. l
+-----------------------------+
| response group allmis~g |
|-----------------------------|
1. | 1 1 0 |
2. | 2 1 0 |
3. | 3 2 0 |
4. | . 2 0 |
5. | 0 3 0 |
|-----------------------------|
6. | . 3 0 |
7. | . 4 1 |
8. | . 4 1 |
+-----------------------------+
Now we make sure the -collapse- carries along
the important information:
. collapse (sum) response (min) allmissing, by(group)
. l
+-----------------------------+
| group response allmis~g |
|-----------------------------|
1. | 1 3 0 |
2. | 2 3 0 |
3. | 3 0 0 |
4. | 4 0 1 |
+-----------------------------+
and convert back:
. replace response = . if allmissing
(1 real change made, 1 to missing)
. l
+-----------------------------+
| group response allmis~g |
|-----------------------------|
1. | 1 3 0 |
2. | 2 3 0 |
3. | 3 0 0 |
4. | 4 . 1 |
+-----------------------------+
Melonie Sullivan
I have data on history of placements into different
-group-s by -youthid-: there are multiple placement
records for each youth. I need to create a variable
equal to the sum of -duration- of all placements into
each -group- for each youth. -collapse (sum)- seems to be
the appropriate procedure, but it treats missing
values as zeroes. This causes a problem if a youth has
only one placement in a given group with unknown
duration. Example:
youthid group duration
11 1 15
11 1 .
11 2 31
11 2 10
11 5 .
12 2 5
12 2 8
12 4 42
12 6 55
I create a duration variable for each group (-generate
grp1dur = duration if group==1-, etc.) and -collapse
(sum)- by -youthid- and I want to get this:
youthid 11 12
grp1dur 15 0
grp2dur 41 13
grp3ddur 0 0
grp4dur 0 42
grp5dur . 0
grp6dur 0 55
But collapse gives me a zero on grp5dur for youth #11,
though youth #11 had placement in that group, albeit
of an unknown duration. The other zeroes are correct;
the youth had zero days in that placement group.
The problem has been addressed here before, best in
the following post by Nick Cox:
http://www.stata.com/statalist/archive/2004-07/msg00783.html
However, this is not solving my particular problem,
because my data essentially looks like a big stack of
Nick's "toy datasets" -- one for each of 1800 youth in
my data. So collapsing by (youthid) gives the same
value of Nick's allmissing for each youth, since the
allmissing tags missing durations for groups within
youths.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/