Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Sergiy Radyakin <serjradyakin@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: How to generate lags where each variable to be lagged has multiple values in the previous time periods |
Date | Mon, 29 Apr 2013 15:39:33 -0400 |
Or explicitly the following code should achieve the same (most of the code is data generation though). Best, Sergiy Radyakin. program drop _all program define generate_example_data clear generate SchoolID=. generate Year=. generate Grade=. generate Score=. forval y=2000/2013 { forval g=1/12 { forval sch=1000/1002 { quietly { set obs `=_N+1' replace Year=`y' in L replace SchoolID=`sch' in L replace Grade=`g' in L replace Score=runiform()*100 in L } } } } format Score %6.2f sort SchoolID Year Grade end program define smartlag, sortpreserve version 9.0 syntax varlist, over(varlist) [stub(string) by(varlist)] if (`"`stub'"'=="") local stub="lag_" isid `over' `by' tempfile laggeddata preserve foreach v in `over' { quietly replace `v'=`v'+1 } foreach v in `varlist' { rename `v' `stub'`v' } sort `over' `by' save `"`laggeddata'"' restore sort `over' `by' merge `over' `by' using `"`laggeddata'"', nokeep drop _merge end generate_example_data list, sepby(SchoolID) smartlag Score, stub(lag_) by(SchoolID) over(Year Grade) sort SchoolID Year Grade list SchoolID Year Grade Score lag_Score, sepby(SchoolID) On Mon, Apr 29, 2013 at 3:32 PM, Nick Cox <njcoxstata@gmail.com> wrote: > Focus on any cohort, say the cohort that was grade 8 in 2011, grade 7 > in 2010 and so forth. Evidently, the difference (year - grade) is > constant, and therefore an identifier, for that cohort. Thus after > > gen id = year - grade > > either > > tsset id year > > or > > tsset id grade > > defines a panel dataset with an identifier and a time variable and > time series operators can then be applied. > > Nick > njcoxstata@gmail.com > > On 29 April 2013 19:46, Stuart Buck <stuartbuck@gmail.com> wrote: > >> Passage rates for all Texas schools for 2008, 2009, 2010, and 2011 -- >> this is important -- by grade. So each row in the dataset is School, >> Year, Grade, and then scores (plus other demographic variables, etc.). >> >> In other words, the dataset looks like this: >> >> Year SchoolID Grade TestScore >> 2011 1 6 *** >> 2011 1 7 *** >> 2011 1 8 *** >> >> And so on and so forth -- multiple grades in each school in each year. >> >> Here's what I want: >> >> To be able to regress any given school's performance in Grade X in >> Year T on, among other things, how that same school did with the same >> cohort of kids in the previous grade (Grade X-1) in the previous year >> (Year T-1). I.e., if a middle school's Grade 8 passage rate in 2011 is >> the outcome, I'd like to be able to control for that same school's >> Grade 7 passage rate in 2010, thus giving a somewhat crude measure of >> how much that group of kids progressed since the previous year. >> >> How would I generate an all-purpose lagged TestScore variable for all >> the schools in the dataset, lagging by both year and grade at once? >> All the Stata instructional material I see on lagged variables just >> lags based on time, not on both time and some other variable too >> (grade). > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/