Okay, so far so good, thanks. But now how do I get
that information into a matrix of this form - one line
for each youth:
youthid y x1 x2 x3 x4 x5 x6
11 0 15 41 0 0 . 0
12 1 0 13 0 42 0 55
where y=dependent variable, x1=duration if group=1,
x2=duration if group=2, etc. If I take your solution,
then generate x1, x2...., and do a -list- I still get
a 6x6 matrix of x for each youth that looks like this:
youthid x1 x2 x3 x4 x5 x6
11 15 . . . . .
11 . 41 . . . .
11 . . . . . .
11 . . . . . .
11 . . . . . .
11 . . . . . .
--- n j cox <[email protected]> wrote:
> As I understand it, you just need to recapitulate
> the 2004 thread, with only one difference:
> your groups are defined by two variables,
> not one.
>
> From your sample data, I would go
>
> . l
>
> +----------------------------+
> | youthid group duration |
> |----------------------------|
> 1. | 11 1 15 |
> 2. | 11 1 . |
> 3. | 11 2 31 |
> 4. | 11 2 10 |
> 5. | 11 5 . |
> |----------------------------|
> 6. | 12 2 5 |
> 7. | 12 2 8 |
> 8. | 12 4 42 |
> 9. | 12 6 55 |
> +----------------------------+
>
> . bysort youthid group (duration) : gen allmissing =
> missing(duration[1])
>
> . l
>
> +---------------------------------------+
> | youthid group duration allmis~g |
> |---------------------------------------|
> 1. | 11 1 15 0 |
> 2. | 11 1 . 0 |
> 3. | 11 2 10 0 |
> 4. | 11 2 31 0 |
> 5. | 11 5 . 1 |
> |---------------------------------------|
> 6. | 12 2 5 0 |
> 7. | 12 2 8 0 |
> 8. | 12 4 42 0 |
> 9. | 12 6 55 0 |
> +---------------------------------------+
>
> . collapse (sum) duration (min) allmissing ,
> by(youthid group)
>
> . l
>
> +---------------------------------------+
> | youthid group duration allmis~g |
> |---------------------------------------|
> 1. | 11 1 15 0 |
> 2. | 11 2 41 0 |
> 3. | 11 5 0 1 |
> 4. | 12 2 13 0 |
> 5. | 12 4 42 0 |
> |---------------------------------------|
> 6. | 12 6 55 0 |
> +---------------------------------------+
>
> . replace duration = . if allmissing
> (1 real change made, 1 to missing)
>
> . l
>
> +---------------------------------------+
> | youthid group duration allmis~g |
> |---------------------------------------|
> 1. | 11 1 15 0 |
> 2. | 11 2 41 0 |
> 3. | 11 5 . 1 |
> 4. | 12 2 13 0 |
> 5. | 12 4 42 0 |
> |---------------------------------------|
> 6. | 12 6 55 0 |
> +---------------------------------------+
>
> You may want to -fillin- to get closer to
> your matrix. In a real example, 3 would also
> appear, I guess.
>
> . fillin youthid group
>
> . l
>
>
> +-------------------------------------------------+
> | youthid group duration allmis~g
> _fillin |
>
> |-------------------------------------------------|
> 1. | 11 1 15 0
> 0 |
> 2. | 11 2 41 0
> 0 |
> 3. | 11 4 . .
> 1 |
> 4. | 11 5 . 1
> 0 |
> 5. | 11 6 . .
> 1 |
>
> |-------------------------------------------------|
> 6. | 12 1 . .
> 1 |
> 7. | 12 2 13 0
> 0 |
> 8. | 12 4 42 0
> 0 |
> 9. | 12 5 . .
> 1 |
> 10. | 12 6 55 0
> 0 |
>
> +-------------------------------------------------+
>
> n j cox wrote:
> > I don't think you mean what you say. Your output
> > requires a -collapse, by(youthid group)-. Sorry
> > that I didn't spell that out.
> >
> > Nick
> > [email protected]
> >
> > Melonie wrote:
> >
> > Thank you, Nick. I had considered that tweak.
> However,
> > I need to collapse by (youthid), which then gives
> me a
> > value of allmissing for each youth, but does not
> tag
> > the missing group for replacement:
> >
> > youthid 11 12
> > grp1dur 15 0
> > grp2dur 41 13
> > grp3ddur 0 0
> > grp4dur 0 42
> > grp5dur 0 0
> > grp6dur 0 55
> > allmissing 1 0
> >
> > n j cox wrote:
> >
> >> Below my signature is the key part of the posting
> referred to.
> >>
> >> You can tweak that approach by
> >>
> >> bysort youthid group (duration) : gen allmissing
> = missing(duration[1])
> >>
> >> and then -collapse- as before.
> >>
> >> Nick
> >> [email protected]
> >>
> >> OK. Consider this toy dataset:
> >>
> >> . l
> >>
> >> +------------------+
> >> | response group |
> >> |------------------|
> >> 1. | 1 1 |
> >> 2. | 2 1 |
> >> 3. | 3 2 |
> >> 4. | . 2 |
> >> 5. | 0 3 |
> >> |------------------|
> >> 6. | . 3 |
> >> 7. | . 4 |
> >> 8. | . 4 |
> >> +------------------+
> >>
> >> I'd regard 0 as a sensible sum for group
> >> 3 but not for group 4, but as you say
> >>
> >> . collapse (sum) response, by(group)
> >>
> >> . l
> >>
> >> +------------------+
> >> | group response |
> >> |------------------|
> >> 1. | 1 3 |
> >> 2. | 2 3 |
> >> 3. | 3 0 |
> >> 4. | 4 0 |
> >> +------------------+
> >>
> >> One way round this is to tag groups
> >> for which _all_ values are missing.
> >> One way to do that is
> >>
> >> . bysort group (response) : gen allmissing =
> mi(response[1])
> >>
> >> That is, within groups, sort on response. If the
> first
> >> value within each group is missing, then all of
> them
> >> in that group must be.
> >>
> >> The principle here is close to one explained at
> >> http://www.stata.com/support/faqs/data/diff.html
> >> You might also like to look at
> >>
> http://www.stata.com/support/faqs/data/anyall.html
> >>
> >> . l
> >>
> >> +-----------------------------+
> >> | response group allmis~g |
> >> |-----------------------------|
> >> 1. | 1 1 0 |
> >> 2. | 2 1 0 |
> >> 3. | 3 2 0 |
> >> 4. | . 2 0 |
> >> 5. | 0 3 0 |
> >> |-----------------------------|
> >> 6. | . 3 0 |
> >> 7. | . 4 1 |
> >> 8. | . 4 1 |
> >> +-----------------------------+
> >>
> >> Now we make sure the -collapse- carries along
> >> the important information:
> >>
> >> . collapse (sum) response (min) allmissing,
> by(group)
> >>
> >> . l
> >>
> >> +-----------------------------+
> >> | group response allmis~g |
> >> |-----------------------------|
> >> 1. | 1 3 0 |
> >> 2. | 2 3 0 |
> >> 3. | 3 0 0 |
> >> 4. | 4 0 1 |
> >> +-----------------------------+
> >>
> >> and convert back:
> >>
> >> . replace response = . if allmissing
> >> (1 real change made, 1 to missing)
> >>
> >> . l
> >>
> >> +-----------------------------+
> >> | group response allmis~g |
> >> |-----------------------------|
> >> 1. | 1 3 0 |
> >> 2. | 2 3 0 |
> >> 3. | 3 0 0 |
> >> 4. | 4 . 1 |
> >> +-----------------------------+
> >>
> >> Melonie Sullivan
> >>
> >> I have data on history of placements into
> different
> >> -group-s by -youthid-: there are multiple
> placement
> >> records for each youth. I need to create a
> variable
> >> equal to the sum of -duration- of all placements
> into
> >> each -group- for each youth. -collapse (sum)-
> seems to be
> >> the appropriate procedure, but it treats missing
> >> values as zeroes. This causes a problem if a
> youth has
> >> only one placement in a given group with unknown
> >> duration. Example:
> >>
> >>
> >> youthid group duration
> >> 11 1 15
> >> 11 1 .
> >> 11 2 31
> >> 11 2 10
> >> 11 5 .
> >> 12 2 5
> >> 12 2 8
> >> 12 4 42
> >> 12 6 55
> >>
> >> I create a duration variable for each group
> (-generate
> >> grp1dur = duration if group==1-, etc.) and
> -collapse
> >> (sum)- by -youthid- and I want to get this:
> >>
> >> youthid 11 12
> >> grp1dur 15 0
> >> grp2dur 41 13
> >> grp3ddur 0 0
> >> grp4dur 0 42
> >> grp5dur . 0
> >> grp6dur 0 55
> >>
> >> But collapse gives me a zero on grp5dur for youth
> #11,
> >> though youth #11 had placement in that group,
> albeit
> >> of an unknown duration. The other zeroes are
> correct;
> >> the youth had zero days in that placement group.
> >>
> >> The problem has been addressed here before, best
> in
> >> the following post by Nick Cox:
> >>
> >>
>
http://www.stata.com/statalist/archive/2004-07/msg00783.html
> >>
> >> However, this is not solving my particular
> problem,
> >> because my data essentially looks like a big
> stack of
> >> Nick's "toy datasets" -- one for each of 1800
> youth in
> >> my data. So collapsing by (youthid) gives the
> same
> >> value of Nick's allmissing for each youth, since
> the
> >> allmissing tags missing durations for groups
> within
> >> youths.
> >>
> >>
> >>
> >
> *
> * For searches and help try:
> *
> http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/