Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Collapsing and Reshaping
From
Eric Booth <[email protected]>
To
"<[email protected]>" <[email protected]>
Subject
Re: st: Collapsing and Reshaping
Date
Tue, 30 Nov 2010 04:34:04 +0000
<>
It's not clear why you are missing year information -- you've got scores from those years, so you could replace year if the score value is not missing. You do need a year value to collapse and/or reshape and, no, filling it with a "numeric dummy" to complete command is generally not a good idea (how would you differentiate the records across id_codes after the collapse/reshape ?).
In the code below, I am making the assumption that there are non-missing values (test scores) for math* and eng* only in the years that correspond to the "year" variable. If this is not how your data are set up, it's probably a good idea to clarify with a snippet of your data (or a fake equivalent). You only need to collapse if you've got multiple tests for the same year (e.g., in my example data below, id_code 113 takes the tests in 2008 twice). ((As a side note, you don't have to collapse this information, you could create a math_1_08 and math_2_08, etc for the max number of tests any student takes in a year -- the "num_test_taken" variable below counts these for you)). Then you can create an id for each observation for use in the reshape command (which will get rid of your "id_code does not uniquely identify the observations" error message).
***************!
**Watch for wrapping issues in the code below.
clear
inp id_code year math06 eng06 math07 eng07 math08 eng08
112 2006 7 3 . . . . .
112 2007 . . 8 6 . .
112 . . . . . 4 2
112 . . . . . .
113 2006 3 2 . . . .
113 2007 . . 9 3 . .
113 . . . . . 2 2
113 2008 . . . . 2 8 .
114 . 1 4 . . . .
114 2007 . . 7 0 . .
114 2008 . . . . 6 6
end
//1. collapse//
**note: obs 113 took 2 tests in 2008
**first, fillin year based on presence of math/eng scores
forval n = 6/8 {
foreach v in math0 eng0 {
replace year = 200`n' if !mi(`v'`n') & mi(year)
}
}
**how many tests did that student take each year?
bys year id_code: g num_tests_taken = _N
li
**next, collapse
ds id_code year, not
collapse (mean) `r(varlist)', by(id_code year)
li
drop year //year isn't necessary
//2. reshape//
g id = _n
reshape long math0 eng0, i(id) j(yr)
**cleanup**
replace yr = 2000+yr
drop if mi(math0, eng0)
rename math0 math
rename eng0 eng
drop id
li
***************!
- Eric
__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
[email protected]
Office: +979.845.6754
Fax: +979.845.0249
http://ppri.tamu.edu
On Nov 29, 2010, at 9:53 PM, Katie and Matt O'Varanese wrote:
> I am trying to reshape some data long... right now the data looks like
> this for example:
>
> id_code year math06 math07 math08 eng06 eng07 eng08
>
> 112 2006
> 112 2007
> 112 .
> 112 .
> 113 2006
> 113 2007
> 113 .
> 113 2008
> 114 .
> 114 2007
> 114 2008
>
> When I try: reshape long [vars@x], i(id_code) j(year)
> i get an error message "i=id_code does not uniquely identify the observations;
> there are multiple observations with the same value of id_code."
> Do I need to collapse first? If so, when I try to collapse by
> (id_code year), I get an error message saying that the year variable
> has missing values.
>
> Do I need to change the missing values to a numeric dummy just to
> complete the command? or is there a better way to do this? Ultimate
> goal: I want to have unique id_code and then a year variable with each
> of the three years represented (06, 07, 08).
>
>
> Help please! Thanks as always!!
>
> Kate
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/