Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: preserving missing values in collapse (sum)


From   Melonie Sullivan <[email protected]>
To   [email protected]
Subject   Re: st: preserving missing values in collapse (sum)
Date   Mon, 22 Oct 2007 06:51:29 -0700 (PDT)

Thank you, Nick. I had considered that tweak. However,
I need to collapse by (youthid), which then gives me a
value of allmissing for each youth, but does not tag
the missing group for replacement:

youthid   11    12
grp1dur   15     0
grp2dur   41    13
grp3ddur   0     0
grp4dur    0    42
grp5dur    0     0
grp6dur    0    55
allmissing 1     0
 
Melonie


--- n j cox <[email protected]> wrote:

> Below my signature is the key part of the posting
> referred to.
> 
> You can tweak that approach by
> 
> bysort youthid group (duration) : gen allmissing =
> missing(duration[1])
> 
> and then -collapse- as before.
> 
> Nick
> [email protected]
> 
> OK. Consider this toy dataset:
> 
> . l
> 
>       +------------------+
>       | response   group |
>       |------------------|
>    1. |        1       1 |
>    2. |        2       1 |
>    3. |        3       2 |
>    4. |        .       2 |
>    5. |        0       3 |
>       |------------------|
>    6. |        .       3 |
>    7. |        .       4 |
>    8. |        .       4 |
>       +------------------+
> 
> I'd regard 0 as a sensible sum for group
> 3 but not for group 4, but as you say
> 
> . collapse (sum) response, by(group)
> 
> . l
> 
>       +------------------+
>       | group   response |
>       |------------------|
>    1. |     1          3 |
>    2. |     2          3 |
>    3. |     3          0 |
>    4. |     4          0 |
>       +------------------+
> 
> One way round this is to tag groups
> for which _all_ values are missing.
> One way to do that is
> 
> . bysort group (response) : gen allmissing =
> mi(response[1])
> 
> That is, within groups, sort on response. If the
> first
> value within each group is missing, then all of them
> in that group must be.
> 
> The principle here is close to one explained at
> http://www.stata.com/support/faqs/data/diff.html
> You might also like to look at
> http://www.stata.com/support/faqs/data/anyall.html
> 
> . l
> 
>       +-----------------------------+
>       | response   group   allmis~g |
>       |-----------------------------|
>    1. |        1       1          0 |
>    2. |        2       1          0 |
>    3. |        3       2          0 |
>    4. |        .       2          0 |
>    5. |        0       3          0 |
>       |-----------------------------|
>    6. |        .       3          0 |
>    7. |        .       4          1 |
>    8. |        .       4          1 |
>       +-----------------------------+
> 
> Now we make sure the -collapse- carries along
> the important information:
> 
> . collapse (sum) response (min) allmissing,
> by(group)
> 
> . l
> 
>       +-----------------------------+
>       | group   response   allmis~g |
>       |-----------------------------|
>    1. |     1          3          0 |
>    2. |     2          3          0 |
>    3. |     3          0          0 |
>    4. |     4          0          1 |
>       +-----------------------------+
> 
> and convert back:
> 
> . replace response = . if allmissing
> (1 real change made, 1 to missing)
> 
> . l
> 
>       +-----------------------------+
>       | group   response   allmis~g |
>       |-----------------------------|
>    1. |     1          3          0 |
>    2. |     2          3          0 |
>    3. |     3          0          0 |
>    4. |     4          .          1 |
>       +-----------------------------+
> 
> Melonie Sullivan
> 
> I have data on history of placements into different
> -group-s by -youthid-: there are multiple placement
> records for each youth. I need to create a variable
> equal to the sum of -duration- of all placements
> into
> each -group- for each youth. -collapse (sum)- seems
> to be
> the appropriate procedure, but it treats missing
> values as zeroes. This causes a problem if a youth
> has
> only one placement in a given group with unknown
> duration. Example:
> 
> 
> youthid         group        duration
> 11                 1            15
> 11                 1             .
> 11                 2            31
> 11                 2            10
> 11                 5             .
> 12                 2             5
> 12                 2             8
> 12                 4            42
> 12                 6            55
> 
> I create a duration variable for each group
> (-generate
> grp1dur = duration if group==1-, etc.) and -collapse
> (sum)- by -youthid- and I want to get this:
> 
> youthid   11    12
> grp1dur   15     0
> grp2dur   41    13
> grp3ddur   0     0
> grp4dur    0    42
> grp5dur    .     0
> grp6dur    0    55
> 
> But collapse gives me a zero on grp5dur for youth
> #11,
> though youth #11 had placement in that group, albeit
> of an unknown duration. The other zeroes are
> correct;
> the youth had zero days in that placement group.
> 
> The problem has been addressed here before, best in
> the following post by Nick Cox:
> 
>
http://www.stata.com/statalist/archive/2004-07/msg00783.html
> 
> However, this is not solving my particular problem,
> because my data essentially looks like a big stack
> of
> Nick's "toy datasets" -- one for each of 1800 youth
> in
> my data. So collapsing by (youthid) gives the same
> value of Nick's allmissing for each youth, since the
> allmissing tags missing durations for groups within
> youths.
> 
> 
> *
> *   For searches and help try:
> *  
> http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index