Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: RE: RE: grouping variables within individuals
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
st: RE: RE: RE: grouping variables within individuals
Date
Tue, 31 Aug 2010 11:23:49 +0100
Two complementary ways of thinking about it:
1. -numericterm- within -id- just defines "spells". For a way of thinking about spells in Stata, see
SJ-7-2 dm0029 . . . . . . . . . . . . . . Speaking Stata: Identifying spells
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
Q2/07 SJ 7(2):249--265 (no commands)
shows how to handle spells with complete control over
spell specification
and (independently of that) -tsspell- from SSC. For the -tsspell- approach to work, you'd need to define a pseudo-time variable, say
. bysort id (numericterm) : gen pstime = _n
and to
. tsset id pstime
But after defining the spells you'd then just ignore the pseudotime variable which has served its purpose.
2. It may be worth noting that
bysort id numericterm: gen byte new=_n==1
is precisely equivalent to
egen tag = tag(id numericterm)
Nick
[email protected]
Martin Weiss
One way to get the order of the terms right is to attach a -label-
clarifying the order and -encode- using this -label-:
***********
clear*
input int id str10 term byte(class grade)
475 "Spr 05" 4 4
475 "Spr 05" 7 0
475 "Fall 05" 7 0
475 "Fall 05" 7 0
475 "Spr 06 " . 2
475 "Spr 06 " . 0
475 "Spr 06 " . 0
475 "Fall 06" . 3
475 "Fall 06" . 0
476 "Fall 05" 5 4
476 "Fall 05" 6 4
476 "Fall 05" 3 4
476 "Fall 05" 2 4
476 "Fall 05" 1 -1
476 "Fall 05" 4 4
476 "Spr 06 " . 1
476 "Spr 06 " . 4
476 "Spr 06 " . 4
476 "Sum 06 " . 3
476 "Sum 06 " . -3
476 "Fall 06" . 3
476 "Fall 06" . 3
476 "Fall 06" . 4
476 "Fall 06" . 4
476 "Fall 06" . 4
477 "Fall 05" 6 4
477 "Fall 05" 2 4
477 "Fall 05" 5 4
477 "Fall 05" 3 4
477 "Fall 05" 4 4
477 "Fall 05" 1 -1
477 "Spr 06 " . 4
477 "Spr 06 " . 0
477 "Spr 06 " . 0
477 "Spr 06 " . 2
477 "Sum 06 " . 4
477 "Fall 06" . 0
477 "Fall 06" . 2
477 "Fall 06" . 0
477 "Spr 07 " . 3
477 "Spr 07 " . 0
477 "Spr 07 " . 0
477 "Fall 07" . 3
477 "Fall 07" . 2
477 "Fall 07" . 3
477 "Spr 08 " . 2
477 "Spr 08 " . 3
477 "Spr 08 " . 4
477 "Spr 08 " . 3
end
replace term=trim(term)
compress
list, noo sepby(id)
la def myterms 1 "Spr 05" 2 "Fall 05" 3 "Spr 06" /*
*/ 4 "Sum 06" 5 "Fall 06" 6 "Spr 07" 7 "Fall 07" /*
*/ 8 "Spr 08"
encode term, gen(numericterm) label(myterms)
bysort id numericterm: gen byte new=_n==1
by id:gen termvar=sum(new)
drop new term
l, sepby(id numericterm) noo
***********
Martin Weiss
The basic technique is shown here. See NJC´s
http://www.stata-journal.com/sjpdf.html?articlenum=pr0004
***********
clear*
// :mylabel , auto // str10 double byte
input int id str10 term byte(class grade)
475 "Spr 05" 4 4
475 "Spr 05" 7 0
475 "Fall 05" 7 0
475 "Fall 05" 7 0
475 "Spr 06 " . 2
475 "Spr 06 " . 0
475 "Spr 06 " . 0
475 "Fall 06" . 3
475 "Fall 06" . 0
476 "Fall 05" 5 4
476 "Fall 05" 6 4
476 "Fall 05" 3 4
476 "Fall 05" 2 4
476 "Fall 05" 1 -1
476 "Fall 05" 4 4
476 "Spr 06 " . 1
476 "Spr 06 " . 4
476 "Spr 06 " . 4
476 "Sum 06 " . 3
476 "Sum 06 " . -3
476 "Fall 06" . 3
476 "Fall 06" . 3
476 "Fall 06" . 4
476 "Fall 06" . 4
476 "Fall 06" . 4
477 "Fall 05" 6 4
477 "Fall 05" 2 4
477 "Fall 05" 5 4
477 "Fall 05" 3 4
477 "Fall 05" 4 4
477 "Fall 05" 1 -1
477 "Spr 06 " . 4
477 "Spr 06 " . 0
477 "Spr 06 " . 0
477 "Spr 06 " . 2
477 "Sum 06 " . 4
477 "Fall 06" . 0
477 "Fall 06" . 2
477 "Fall 06" . 0
477 "Spr 07 " . 3
477 "Spr 07 " . 0
477 "Spr 07 " . 0
477 "Fall 07" . 3
477 "Fall 07" . 2
477 "Fall 07" . 3
477 "Spr 08 " . 2
477 "Spr 08 " . 3
477 "Spr 08 " . 4
477 "Spr 08 " . 3
end
replace term=trim(term)
compress
list, noobs sepby(id)
bysort id term: gen byte new=_n==1
by id:gen termvar=sum(new)
l, sepby(id term) noo
***********
You have to let Stata know the ordering of the terms, though...
Devora Shamah
I am working with a dataset that contains grade records for students over
several terms. Each student has 1-6 classes per term. I need to assign each
term a "count of term" so I can create variables that capture the total
number of classes taken during the students' first and second term and the
percent of classes they have passed. The students all started at different
terms and not all of them took classes in consecutive terms. I have easily
identified the first term by using the row minimum command in stata. I am
struggling to find an efficient and accurate way to identify the second
term. I would appreciate any thoughts anyone has.
My data looks like this (in long form) The class values refer to type of
class, and grades range from withdraws through A's. Essentially I need a
way to identify that for student 475, Spring 05 was his first term, and Fall
)5 was his second term, for student 476 Fall 05 was his first term and
Spring 05 was his second term and so on.
id term class grade
475 Spr 05 4 4
475 Spr 05 2 2
475 Spr 05 5 3
475 Spr 05 3 2
475 Spr 05 1 -1
475 Spr 05 6 3
475 Fall 05 7 0
475 Fall 05 7 0
475 Fall 05 7 0
475 Spr 06 2
475 Spr 06 0
475 Spr 06 0
475 Fall 06 3
475 Fall 06 0
476 Fall 05 5 4
476 Fall 05 6 4
476 Fall 05 3 4
476 Fall 05 2 4
476 Fall 05 1 -1
476 Fall 05 4 4
476 Spr 06 1
476 Spr 06 4
476 Spr 06 4
476 Sum 06 3
476 Sum 06 -3
476 Fall 06 3
476 Fall 06 3
476 Fall 06 4
476 Fall 06 4
476 Fall 06 4
477 Fall 05 6 4
477 Fall 05 2 4
477 Fall 05 5 4
477 Fall 05 3 4
477 Fall 05 4 4
477 Fall 05 1 -1
477 Spr 06 4
477 Spr 06 0
477 Spr 06 0
477 Spr 06 2
477 Sum 06 4
477 Fall 06 0
477 Fall 06 2
477 Fall 06 0
477 Spr 07 3
477 Spr 07 0
477 Spr 07 0
477 Fall 07 3
477 Fall 07 2
477 Fall 07 3
477 Spr 08 2
477 Spr 08 3
477 Spr 08 4
477 Spr 08 3
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/