The main issue here seems to be getting Stata to
be smart enough to recognise (for example)
that "GRANT" and "SONYA GRANT" are the same
person. You could try working in terms of
last name only, which would be
word(teacher, -1)
-- but this might create the opposite problem
of conflating different teachers.
Alternatively there are various handles
in -groups- on SSC that might be useful.
Nick
[email protected]
Gushta, Matthew
> i have a dataset containing student test scores. within this data are
> district, school, and teacher variables. i will be running a
> mixed model
> incorporating all of these variables, unfortunately, the teacher
> variable is a manually-entered string variable. this means that within
> school X, there might be teachers A, B, and C, however, due to
> variations in data entry, teachers may appear different who
> in fact are not.
>
> in order to QC this and recode teacher values where
> appropriate, i would
> like to basically crosstab school and teacher variables, so that only
> unique teacher values appear within each school. you can see that each
> school is presented in a separate table and teacher "grant" appears
> twice in school 2766 (see the syntax and sample output below).
>
> ...given 2105 districts and 5262 teachers, this output is quite
> cumbersome.
>
> is there a simpler, more compressed format for such output? i.e., a
> single table?
> bysort schirn: tab teacher
>
> **************************************************
> OUTPUT
> --------------------------------------------------
> -> schirn = 2758
>
> TEACHER | Freq. Percent Cum.
> --------------+-----------------------------------
> HANTHORX | 14 31.11 31.11
> MILLER | 15 33.33 64.44
> SMITH | 16 35.56 100.00
> --------------+-----------------------------------
> Total | 45 100.00
>
> --------------------------------------------------
> -> schirn = 2766
>
> TEACHER | Freq. Percent Cum.
> --------------+-----------------------------------
> CAMPBELL | 24 7.50 7.50
> DOLORESCO | 23 7.19 14.69
> FLEMING RACHE | 25 7.81 22.50
> GRANT | 1 0.31 22.81
> HAAS | 25 7.81 30.63
> HARRISON | 25 7.81 38.44
> JONES | 25 7.81 46.25
> L SMITH | 25 7.81 54.06
> LABUS | 25 7.81 61.88
> OWENS | 25 7.81 69.69
> SMIALEK | 22 6.88 76.56
> SONYA GRANT | 25 7.81 84.38
> STAUFFER | 25 7.81 92.19
> WELLING | 25 7.81 100.00
> --------------+-----------------------------------
> Total | 320 100.00
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/