Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: RE: RE: grouping variables within individuals

From	Nick Cox <[email protected]>
To	"'[email protected]'" <[email protected]>
Subject	st: RE: RE: RE: grouping variables within individuals
Date	Tue, 31 Aug 2010 11:23:49 +0100

Two complementary ways of thinking about it: 

1. -numericterm- within -id- just defines "spells". For a way of thinking about spells in Stata, see

SJ-7-2  dm0029  . . . . . . . . . . . . . . Speaking Stata: Identifying spells
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q2/07   SJ 7(2):249--265                                 (no commands)
        shows how to handle spells with complete control over
        spell specification

and (independently of that) -tsspell- from SSC. For the -tsspell- approach to work, you'd need to define a pseudo-time variable, say 

. bysort id (numericterm) : gen pstime = _n 

and to 

. tsset id pstime 

But after defining the spells you'd then just ignore the pseudotime variable which has served its purpose.

2. It may be worth noting that 

bysort id numericterm: gen byte new=_n==1

is precisely equivalent to 

egen tag = tag(id numericterm)

Nick 
[email protected] 

Martin Weiss

One way to get the order of the terms right is to attach a -label-
clarifying the order and -encode- using this -label-:


***********
clear*

input int id str10 term  byte(class  grade)
475  "Spr 05"    4              4
475  "Spr 05"    7        0
475  "Fall 05"    7        0
475  "Fall 05"    7        0
475  "Spr 06 "    .        2
475  "Spr 06 "    .        0
475  "Spr 06 "    .        0
475  "Fall 06"    .         3
475  "Fall 06"    .         0
476  "Fall 05"    5        4
476  "Fall 05"    6        4
476  "Fall 05"    3        4
476  "Fall 05"    2        4
476  "Fall 05"    1        -1
476  "Fall 05"    4        4
476  "Spr 06 "    .        1
476  "Spr 06 "    .        4
476  "Spr 06 "    .        4
476  "Sum 06 "    .     3
476  "Sum 06 "    .     -3
476  "Fall 06"    .         3
476  "Fall 06"    .         3
476  "Fall 06"    .         4
476  "Fall 06"    .         4
476  "Fall 06"    .         4
477  "Fall 05"    6        4
477  "Fall 05"    2        4
477  "Fall 05"    5        4
477  "Fall 05"    3        4
477  "Fall 05"    4        4
477  "Fall 05"    1        -1
477  "Spr 06 "     .        4
477  "Spr 06 "     .        0
477  "Spr 06 "     .        0
477  "Spr 06 "     .        2
477  "Sum 06 "     .     4
477  "Fall 06"     .         0
477  "Fall 06"     .         2
477  "Fall 06"     .         0
477  "Spr 07 "     .        3
477  "Spr 07 "     .        0
477  "Spr 07 "     .        0
477  "Fall 07"     .         3
477  "Fall 07"     .         2
477  "Fall 07"     .         3
477  "Spr 08 "     .        2
477  "Spr 08 "     .        3
477  "Spr 08 "     .        4
477  "Spr 08 "     .        3
end


replace term=trim(term)
compress
list, noo sepby(id)

la def myterms 1 "Spr 05" 2 "Fall 05" 3 "Spr 06" /* 
*/ 4 "Sum 06" 5 "Fall 06" 6 "Spr 07" 7 "Fall 07" /* 
*/ 8 "Spr 08"

encode term, gen(numericterm) label(myterms)

bysort id numericterm: gen byte new=_n==1
by id:gen termvar=sum(new)
drop new term

l, sepby(id numericterm) noo
***********

Martin Weiss

The basic technique is shown here. See NJC´s
http://www.stata-journal.com/sjpdf.html?articlenum=pr0004


***********
clear*

   // :mylabel , auto // str10 double byte
input int id str10 term   byte(class  grade)
475        "Spr 05"    4              4
475        "Spr 05"    7              0
475        "Fall 05"    7              0
475        "Fall 05"    7              0
475        "Spr 06 "    .              2
475        "Spr 06 "    .              0
475        "Spr 06 "    .              0
475        "Fall 06"    .               3
475        "Fall 06"    .               0
476        "Fall 05"    5              4
476        "Fall 05"    6              4
476        "Fall 05"    3              4
476        "Fall 05"    2              4
476        "Fall 05"    1              -1
476        "Fall 05"    4              4
476        "Spr 06 "    .              1
476        "Spr 06 "    .              4
476        "Spr 06 "    .              4
476        "Sum 06 "    .           3
476        "Sum 06 "    .           -3
476        "Fall 06"    .               3
476        "Fall 06"    .               3
476        "Fall 06"    .               4
476        "Fall 06"    .               4
476        "Fall 06"    .               4
477        "Fall 05"    6              4
477        "Fall 05"    2              4
477        "Fall 05"    5              4
477        "Fall 05"    3              4
477        "Fall 05"    4              4
477        "Fall 05"    1              -1
477        "Spr 06 "     .              4
477        "Spr 06 "     .              0
477        "Spr 06 "     .              0
477        "Spr 06 "     .              2
477        "Sum 06 "     .           4
477        "Fall 06"     .               0
477        "Fall 06"     .               2
477        "Fall 06"     .               0
477        "Spr 07 "     .              3
477        "Spr 07 "     .              0
477        "Spr 07 "     .              0
477        "Fall 07"     .               3
477        "Fall 07"     .               2
477        "Fall 07"     .               3
477        "Spr 08 "     .              2
477        "Spr 08 "     .              3
477        "Spr 08 "     .              4
477        "Spr 08 "     .              3
end


replace term=trim(term)
compress
list, noobs  sepby(id)

bysort id term: gen byte new=_n==1
by id:gen termvar=sum(new)
l, sepby(id term) noo
***********

You have to let Stata know the ordering of the terms, though...

Devora Shamah

I am working with a dataset that contains grade records for students over
several terms. Each student has 1-6 classes per term. I need to assign each
term a "count of term" so I can create variables that capture the total
number of classes taken during the students' first and second term and the
percent of classes they have passed. The students all started at different
terms and not all of them took classes in consecutive terms. I have easily
identified the first term by using the row minimum command in stata. I am
struggling to find an efficient and accurate way to identify the second
term. I would appreciate any thoughts anyone has.

My data looks like this (in long form) The class values refer to type of
class, and grades range from withdraws through A's.  Essentially I need a
way to identify that for student 475, Spring 05 was his first term, and Fall
)5 was his second term, for student 476 Fall 05 was his first term and
Spring 05 was his second term and so on. 

id            term      class       grade
475         Spr 05    4              4
475         Spr 05    2              2
475         Spr 05    5              3
475         Spr 05    3              2
475         Spr 05    1              -1
475         Spr 05    6              3
475         Fall 05    7              0
475         Fall 05    7              0
475         Fall 05    7              0
475         Spr 06                    2
475         Spr 06                    0
475         Spr 06                    0
475         Fall 06                    3
475         Fall 06                    0
476         Fall 05    5              4
476         Fall 05    6              4
476         Fall 05    3              4
476         Fall 05    2              4
476         Fall 05    1              -1
476         Fall 05    4              4
476         Spr 06                    1
476         Spr 06                    4
476         Spr 06                    4
476         Sum 06                 3
476         Sum 06                 -3
476         Fall 06                    3
476         Fall 06                    3
476         Fall 06                    4
476         Fall 06                    4
476         Fall 06                    4
477         Fall 05    6              4
477         Fall 05    2              4
477         Fall 05    5              4
477         Fall 05    3              4
477         Fall 05    4              4
477         Fall 05    1              -1
477         Spr 06                    4
477         Spr 06                    0
477         Spr 06                    0
477         Spr 06                    2
477         Sum 06                 4
477         Fall 06                    0
477         Fall 06                    2
477         Fall 06                    0
477         Spr 07                    3
477         Spr 07                    0
477         Spr 07                    0
477         Fall 07                    3
477         Fall 07                    2
477         Fall 07                    3
477         Spr 08                    2
477         Spr 08                    3
477         Spr 08                    4
477         Spr 08                    3



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- st: RE: RE: RE: RE: grouping variables within individuals
  - From: "Martin Weiss" <[email protected]>

References:
- st: grouping variables within individuals
  - From: Devora Shamah <[email protected]>
- st: RE: grouping variables within individuals
  - From: "Martin Weiss" <[email protected]>
- st: RE: RE: grouping variables within individuals
  - From: "Martin Weiss" <[email protected]>

Prev by Date: Re: st: RE: Outputs as inputs - how to efficiently process a series of routines?
Next by Date: RE: st: Outreg2 - file cannot be openend
Previous by thread: st: RE: RE: grouping variables within individuals
Next by thread: st: RE: RE: RE: RE: grouping variables within individuals
Index(es):
- Date
- Thread