Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Variable running totals

From	Jorge Eduardo Pérez Pérez <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: Variable running totals
Date	Thu, 31 May 2012 18:42:28 -0400

Observations 2 and 3 in this dataset are identical, shouldn't the
count be 3 in both? Up to date 1002, id 1 has appeared 3 times, 1 on
date 1000 and 2 on date 1002.

I hope that makes sense, but it is only a way of justifying that the
following code gives code30=3 for all the observations with
date==1002. It's not as simple as it could be, it loops through ids,
but it is better than looping through observations.

levelsof id, local(ids)
foreach x in `ids' {
	preserve
	keep if id==`x'
	* Deal with duplicates before -tsset-
	duplicates tag date, gen(dup)
	duplicates drop date, force
	tsset date
	tsfill
	gen t=_n
	* Require at least 60 obs for moving average
	if _N<60 {
		tsappend, add(`=60-_N')
	}
	gen b=(id!=.)
	* Count duplicates too
	replace b=b+dup if dup!=.
	tssmooth ma count30=b, window(29 1 0)
	replace count30=30*count30
	* Replace for simple count for the first 30 days
	replace count30=sum(b) if t<30
	keep if id!=.
	keep id date count30
	tempfile `x'
	save ``x''
	restore
}
preserve
clear
foreach x in `ids' {
	append using ``x''
}
tempfile count
save `count'
restore
merge n:1 id date using `count', keep(match master)
drop _merge
br

id	date	count30
1	1000	1
1	1002	3
1	1002	3
1	1200	1
1	1250	1
2	1050	1
2	1059	2
2	1085	2


On Thu, May 31, 2012 at 4:27 PM, Schaffer, Mark E <[email protected]> wrote:
>
> Hi all.  "Variable running totals" isn't the best description of the
> problem, but it's not too far off.
>
> A colleague has written to me with the following problem.  He has a
> panel dataset with two variables: id and date.  (He has some other
> variables but those are the two that matter.)  There may be multiple
> observations on id for a given date.  The date variable is in Stata %td
> format (#days after 01jan1960).  So it looks like this:
>
> id      date
> 1       1000
> 1       1002
> 1       1002
> 1       1200
> 1       1250
> 2       1050
> 2       1059
> 2       1085
>
> ...etc.
>
>
> The question is, how to construct a variable that counts the number of
> observations that an individual (id) appears in the dataset up to 30
> days previously.  If we call the variable count30, it would look like
> this:
>
> id      date    count30
> 1       1000            1
> 1       1002            2
> 1       1002            3
> 1       1200            1
> 1       1250            1
> 2       1050            1
> 2       1059            2
> 2       1085            2
>
> ...etc.
>
> I suspect there's an easy way of doing this, but the only ways I could
> think of involved brute force looping through observations.
>
> Any ideas?
>
> --Mark
>
>
> --
> Heriot-Watt University is the Sunday Times
> Scottish University of the Year 2011-2012
>
> Heriot-Watt University is a Scottish charity
> registered under charity number SC000278.
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: St: cumulative incidence graph
Next by Date: st: simultaneous equations with both binary and continuous dependent variables
Previous by thread: Re: st: Variable running totals
Next by thread: st: significance of correlation coeff accounting for clustering
Index(es):
- Date
- Thread