Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Variable running totals
From
Jorge Eduardo Pérez Pérez <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Variable running totals
Date
Thu, 31 May 2012 18:42:28 -0400
Observations 2 and 3 in this dataset are identical, shouldn't the
count be 3 in both? Up to date 1002, id 1 has appeared 3 times, 1 on
date 1000 and 2 on date 1002.
I hope that makes sense, but it is only a way of justifying that the
following code gives code30=3 for all the observations with
date==1002. It's not as simple as it could be, it loops through ids,
but it is better than looping through observations.
levelsof id, local(ids)
foreach x in `ids' {
preserve
keep if id==`x'
* Deal with duplicates before -tsset-
duplicates tag date, gen(dup)
duplicates drop date, force
tsset date
tsfill
gen t=_n
* Require at least 60 obs for moving average
if _N<60 {
tsappend, add(`=60-_N')
}
gen b=(id!=.)
* Count duplicates too
replace b=b+dup if dup!=.
tssmooth ma count30=b, window(29 1 0)
replace count30=30*count30
* Replace for simple count for the first 30 days
replace count30=sum(b) if t<30
keep if id!=.
keep id date count30
tempfile `x'
save ``x''
restore
}
preserve
clear
foreach x in `ids' {
append using ``x''
}
tempfile count
save `count'
restore
merge n:1 id date using `count', keep(match master)
drop _merge
br
id date count30
1 1000 1
1 1002 3
1 1002 3
1 1200 1
1 1250 1
2 1050 1
2 1059 2
2 1085 2
On Thu, May 31, 2012 at 4:27 PM, Schaffer, Mark E <[email protected]> wrote:
>
> Hi all. "Variable running totals" isn't the best description of the
> problem, but it's not too far off.
>
> A colleague has written to me with the following problem. He has a
> panel dataset with two variables: id and date. (He has some other
> variables but those are the two that matter.) There may be multiple
> observations on id for a given date. The date variable is in Stata %td
> format (#days after 01jan1960). So it looks like this:
>
> id date
> 1 1000
> 1 1002
> 1 1002
> 1 1200
> 1 1250
> 2 1050
> 2 1059
> 2 1085
>
> ...etc.
>
>
> The question is, how to construct a variable that counts the number of
> observations that an individual (id) appears in the dataset up to 30
> days previously. If we call the variable count30, it would look like
> this:
>
> id date count30
> 1 1000 1
> 1 1002 2
> 1 1002 3
> 1 1200 1
> 1 1250 1
> 2 1050 1
> 2 1059 2
> 2 1085 2
>
> ...etc.
>
> I suspect there's an easy way of doing this, but the only ways I could
> think of involved brute force looping through observations.
>
> Any ideas?
>
> --Mark
>
>
> --
> Heriot-Watt University is the Sunday Times
> Scottish University of the Year 2011-2012
>
> Heriot-Watt University is a Scottish charity
> registered under charity number SC000278.
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/