I posted untested code. If someone finds it interesting but reports that it does not work when the stated assumptions are correct than I will look at it again.
Nick
[email protected]
Martin Weiss
What is the result of your code for you? I get the correct values for the
score (as in my -collapse- solution), but no dates...
*************
clear*
input id Year str10 MathDate MathScore str10 EngDate EngScore
1 2002 "" . "6/21/02" 85
1 2003 "06/20/03" 55 "" .
1 2004 "" . "" .
1 2005 "" . "" .
2 2002 "" . "" .
2 2003 "06/20/03" 40 "" .
2 2004 "05/31/04" 55 "" .
2 2005 "" . "6/15/05" 65
end
compress
gen mathdate=date(MathDate, "MD20Y")
gen engdate=date(EngDate, "MD20Y")
format mathdate engdate %tdNN/DD/CCYY
gen date = min(mathdate, engdate)
bysort id (date) : ///
replace MathScore = MathScore[_n-1]/*
*/ if missing(MathScore)
by id: replace EngScore = EngScore[_n-1] /*
*/ if missing(EngScore)
by id: keep if _n == _N
list, noobs
*************
My result:
+---------------------------------------------------------------------------
-------+
| id Year MathDate MathSc~e EngDate EngScore mathdate
engdate date |
|---------------------------------------------------------------------------
-------|
| 1 2005 55 85 .
. . |
| 2 2002 55 65 .
. . |
+---------------------------------------------------------------------------
-------+
HTH
Martin
-----Ursprüngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von Nick Cox
Gesendet: Mittwoch, 15. Juli 2009 17:43
An: [email protected]
Betreff: st: RE: AW: Reshaping Data File
Here's another way to do it. I assume, unlike Martin, that *Date are
Stata date variables.
gen date = min(MathDate, EngDate)
bysort id (date) : ///
replace MathScore = MathScore[_n-1] if missing(MathScore)
by id: replace EngScore = EngScore[_n-1] if missing(EngScore)
by id: keep if _n == _N
Nick
[email protected]
Martin Weiss
============
If it is all about the results:
clear*
input id Year str10 MathDate MathScore str10 EngDate EngScore
1 2002 "" . "6/21/02" 85
1 2003 "06/20/03" 55 "" .
1 2004 "" . "" .
1 2005 "" . "" .
2 2002 "" . "" .
2 2003 "06/20/03" 40 "" .
2 2004 "05/31/04" 55 "" .
2 2005 "" . "6/15/05" 65
end
compress
list, noobs sepby(id)
/*
Not really necessary...
gen mathdate=date(MathDate, "DMY")
format mathdate %tdMonth_DD,_CCYY
gen engdate=date(EngDate, "DMY")
format engdate %tdMonth_DD,_CCYY
*/
collapse (lastnm) MathScore /*
*/ EngScore, by(id)
list, noobs sepby(id)
Meryle Weinstein, Ph.D.
=======================
I have a dataset that contains 4 records per student representing scores
on
tests taken in each year. Each student may or may not have taken the
exam
in a particular year, or may have taken the same test in multiple years.
I
want to create a dataset that has one record for each student that has
each
of the tests taken, regardless of year and if taken multiple times the
last
test results.
Here's what the file looks like
Id Year MathDate MathScore EngDate EngScore
1 2002 6/21/02 85
1 2003 06/20/03 55
1 2004
1 2005
2 2002
2 2003 06/20/03 40
2 2004 05/31/04 55
2 2005 6/15/05 65
I thought I could do a reshape wide but there are lots of cells with no
data, plus some cells with multiple data for the same student.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/