Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Jorge Eduardo Pérez Pérez <perez.jorge@ur.edu.co> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: Variable running totals |
Date | Thu, 31 May 2012 18:42:28 -0400 |
Observations 2 and 3 in this dataset are identical, shouldn't the count be 3 in both? Up to date 1002, id 1 has appeared 3 times, 1 on date 1000 and 2 on date 1002. I hope that makes sense, but it is only a way of justifying that the following code gives code30=3 for all the observations with date==1002. It's not as simple as it could be, it loops through ids, but it is better than looping through observations. levelsof id, local(ids) foreach x in `ids' { preserve keep if id==`x' * Deal with duplicates before -tsset- duplicates tag date, gen(dup) duplicates drop date, force tsset date tsfill gen t=_n * Require at least 60 obs for moving average if _N<60 { tsappend, add(`=60-_N') } gen b=(id!=.) * Count duplicates too replace b=b+dup if dup!=. tssmooth ma count30=b, window(29 1 0) replace count30=30*count30 * Replace for simple count for the first 30 days replace count30=sum(b) if t<30 keep if id!=. keep id date count30 tempfile `x' save ``x'' restore } preserve clear foreach x in `ids' { append using ``x'' } tempfile count save `count' restore merge n:1 id date using `count', keep(match master) drop _merge br id date count30 1 1000 1 1 1002 3 1 1002 3 1 1200 1 1 1250 1 2 1050 1 2 1059 2 2 1085 2 On Thu, May 31, 2012 at 4:27 PM, Schaffer, Mark E <M.E.Schaffer@hw.ac.uk> wrote: > > Hi all. "Variable running totals" isn't the best description of the > problem, but it's not too far off. > > A colleague has written to me with the following problem. He has a > panel dataset with two variables: id and date. (He has some other > variables but those are the two that matter.) There may be multiple > observations on id for a given date. The date variable is in Stata %td > format (#days after 01jan1960). So it looks like this: > > id date > 1 1000 > 1 1002 > 1 1002 > 1 1200 > 1 1250 > 2 1050 > 2 1059 > 2 1085 > > ...etc. > > > The question is, how to construct a variable that counts the number of > observations that an individual (id) appears in the dataset up to 30 > days previously. If we call the variable count30, it would look like > this: > > id date count30 > 1 1000 1 > 1 1002 2 > 1 1002 3 > 1 1200 1 > 1 1250 1 > 2 1050 1 > 2 1059 2 > 2 1085 2 > > ...etc. > > I suspect there's an easy way of doing this, but the only ways I could > think of involved brute force looping through observations. > > Any ideas? > > --Mark > > > -- > Heriot-Watt University is the Sunday Times > Scottish University of the Year 2011-2012 > > Heriot-Watt University is a Scottish charity > registered under charity number SC000278. > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/