Okay, -collapse(sum) by (youthid group) results in
something like the matrix below for each youth. But I
need a single record for each youth with sum of
durations for each group. If I understand your
solution, I then type a series of -replace x=. if
allmissing- where x=grp1dur, grp2dur, etc. But then I
need to collapse again to get a single sum for each
group for each youth, and I am back where I started.
youthid 11 11 11 11 11 11
grp1dur 15 0 0 0 0 0
grp2dur 0 41 0 0 0 0
grp3ddur 0 0 0 0 0 0
grp4dur 0 0 0 0 0 0
grp5dur 0 0 0 0 0 0
grp6dur 0 0 0 0 0 0
allmissing 0 0 0 0 1 0
--- n j cox <[email protected]> wrote:
> I don't think you mean what you say. Your output
> requires a -collapse, by(youthid group)-. Sorry
> that I didn't spell that out.
>
> Nick
> [email protected]
>
> Melonie wrote:
>
> Thank you, Nick. I had considered that tweak.
> However,
> I need to collapse by (youthid), which then gives me
> a
> value of allmissing for each youth, but does not tag
> the missing group for replacement:
>
> youthid 11 12
> grp1dur 15 0
> grp2dur 41 13
> grp3ddur 0 0
> grp4dur 0 42
> grp5dur 0 0
> grp6dur 0 55
> allmissing 1 0
>
> n j cox wrote:
>
> > Below my signature is the key part of the posting
> referred to.
> >
> > You can tweak that approach by
> >
> > bysort youthid group (duration) : gen allmissing =
> missing(duration[1])
> >
> > and then -collapse- as before.
> >
> > Nick
> > [email protected]
> >
> > OK. Consider this toy dataset:
> >
> > . l
> >
> > +------------------+
> > | response group |
> > |------------------|
> > 1. | 1 1 |
> > 2. | 2 1 |
> > 3. | 3 2 |
> > 4. | . 2 |
> > 5. | 0 3 |
> > |------------------|
> > 6. | . 3 |
> > 7. | . 4 |
> > 8. | . 4 |
> > +------------------+
> >
> > I'd regard 0 as a sensible sum for group
> > 3 but not for group 4, but as you say
> >
> > . collapse (sum) response, by(group)
> >
> > . l
> >
> > +------------------+
> > | group response |
> > |------------------|
> > 1. | 1 3 |
> > 2. | 2 3 |
> > 3. | 3 0 |
> > 4. | 4 0 |
> > +------------------+
> >
> > One way round this is to tag groups
> > for which _all_ values are missing.
> > One way to do that is
> >
> > . bysort group (response) : gen allmissing =
> mi(response[1])
> >
> > That is, within groups, sort on response. If the
> first
> > value within each group is missing, then all of
> them
> > in that group must be.
> >
> > The principle here is close to one explained at
> > http://www.stata.com/support/faqs/data/diff.html
> > You might also like to look at
> > http://www.stata.com/support/faqs/data/anyall.html
> >
> > . l
> >
> > +-----------------------------+
> > | response group allmis~g |
> > |-----------------------------|
> > 1. | 1 1 0 |
> > 2. | 2 1 0 |
> > 3. | 3 2 0 |
> > 4. | . 2 0 |
> > 5. | 0 3 0 |
> > |-----------------------------|
> > 6. | . 3 0 |
> > 7. | . 4 1 |
> > 8. | . 4 1 |
> > +-----------------------------+
> >
> > Now we make sure the -collapse- carries along
> > the important information:
> >
> > . collapse (sum) response (min) allmissing,
> by(group)
> >
> > . l
> >
> > +-----------------------------+
> > | group response allmis~g |
> > |-----------------------------|
> > 1. | 1 3 0 |
> > 2. | 2 3 0 |
> > 3. | 3 0 0 |
> > 4. | 4 0 1 |
> > +-----------------------------+
> >
> > and convert back:
> >
> > . replace response = . if allmissing
> > (1 real change made, 1 to missing)
> >
> > . l
> >
> > +-----------------------------+
> > | group response allmis~g |
> > |-----------------------------|
> > 1. | 1 3 0 |
> > 2. | 2 3 0 |
> > 3. | 3 0 0 |
> > 4. | 4 . 1 |
> > +-----------------------------+
> >
> > Melonie Sullivan
> >
> > I have data on history of placements into
> different
> > -group-s by -youthid-: there are multiple
> placement
> > records for each youth. I need to create a
> variable
> > equal to the sum of -duration- of all placements
> into
> > each -group- for each youth. -collapse (sum)-
> seems to be
> > the appropriate procedure, but it treats missing
> > values as zeroes. This causes a problem if a youth
> has
> > only one placement in a given group with unknown
> > duration. Example:
> >
> >
> > youthid group duration
> > 11 1 15
> > 11 1 .
> > 11 2 31
> > 11 2 10
> > 11 5 .
> > 12 2 5
> > 12 2 8
> > 12 4 42
> > 12 6 55
> >
> > I create a duration variable for each group
> (-generate
> > grp1dur = duration if group==1-, etc.) and
> -collapse
> > (sum)- by -youthid- and I want to get this:
> >
> > youthid 11 12
> > grp1dur 15 0
> > grp2dur 41 13
> > grp3ddur 0 0
> > grp4dur 0 42
> > grp5dur . 0
> > grp6dur 0 55
> >
> > But collapse gives me a zero on grp5dur for youth
> #11,
> > though youth #11 had placement in that group,
> albeit
> > of an unknown duration. The other zeroes are
> correct;
> > the youth had zero days in that placement group.
> >
> > The problem has been addressed here before, best
> in
> > the following post by Nick Cox:
> >
> >
>
http://www.stata.com/statalist/archive/2004-07/msg00783.html
> >
> > However, this is not solving my particular
> problem,
> > because my data essentially looks like a big stack
> of
> > Nick's "toy datasets" -- one for each of 1800
> youth in
> > my data. So collapsing by (youthid) gives the same
> > value of Nick's allmissing for each youth, since
> the
> > allmissing tags missing durations for groups
> within
> > youths.
> >
> >
> >
> *
> * For searches and help try:
> *
> http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/