I don't see how Naomi's solution gets at the most
difficult part of this problem, which is to take account
of the irregularity of times observed in counting over
the last # days.
-egen, count()-, or even -egen, total()- which is a better
bet for similar problems, is more or less useless for this
kind of problem as the relevant time interval varies.
I too am probably missing something simple. However,
when all inspiration fails, try brute force.
The brute force approach is quite easy to program
and at worst requires a single loop over the observations.
An approach discussed at length in
Cox, N.J. 2007. Making it count. Stata Journal 7(1)
is to set up a variable, loop over possibilities using
-count- for each observation, and replace each value of that
variable by the result.
This is built into the following program:
*! 1.0.0 NJC 22 July 2007
program count_recent
version 8
syntax [if] [in], Lag(numlist int max=1 >0) Generate(str)
quietly {
confirm new var `generate'
marksample touse
count if `touse'
if r(N) == 0 error 2000
tsset
local p "`r(panelvar)'"
local t "`r(timevar)'"
if "`p'" == "" {
tempvar p
gen byte `p' = 1
}
gen `generate' = .
forval i = 1/`=_N' {
if `touse'[`i'] {
count if `touse' ///
& inrange(`t'[`i'] - `t', 1, `lag') ///
& `p' == `p'[`i']
replace `generate' = r(N) in `i'
}
}
}
end
What we are counting, for each observation, are
how many observations are
(c) in the same panel (whenever there is panel structure)
-- you don't quite say this is what you want, but I guess
it's true.
(b) within 1 to -lag- (compulsory option) time units previous
(a) relevant (by default all observations). This is determined
by any -if- or -in- conditions.
I assume a prior -tsset-.
So, examples could be
tsset ID Date
count_recent , lag(30) generate(prev30)
count_recent if Response == 1, lag(60) generate(pos_prev60)
Nick
[email protected]
Naomi Levy
> I am no expert here, and there is likely to be a much
> easier way to do this than what I am suggesting, but this is what I
> would do:
>
> I would -reshape- your data from long form to wide
> form so that each row is an ID and the responses on each day
> of contact
> become separate variables.
>
> The new form would look like this:
>
> ID Response37200 Var137200 Var237200
> Response37210 Var137210 Var237210
> 1 1 1 1
> 0 2 1
>
> Before
> you do this I suggest dropping any variables you don't need for this
> analysis and renaming variables so their names are shorter (e.g.
> response to r). Also, if all you are interested in for the
> analysis are more recent
> dates of contact, you can drop all the data for prior dates
> of contact.
>
> the syntax for reshape is:
> reshape wide [varlist], i(id) j(date)
>
> once
> you've done that, you can just generate a new variable that
> sums across
> the responses (once counting non-missing responses, and once counting
> positive responses).
>
> after doing that, you can easily reshape the data back to long form:
> reshape long [varlist], i(id) j(date)
Andrew Stocking
> I have an unbalanced panel of subjects who have been
> contacted very
> irregularly over the past 5 years. Total contacts range from
> 40-250 during
> the 5 year period depending on the person. I'd like to create two
> variables: one that counts the total number of contacts in
> the last 30 or 60
> days and a second that sums the number of positive responses
> over the same
> 30 or 60 days. For each contact there could be anywhere from
> 0-15 contacts
> in the last 30 days.
>
> My data looks like:
> ID Date Response Var1 Var2
> 1 37200 1 1 1
> 1 37210 0 2 1
> 1 37215 1 3 2
> 1 37229 1 4 3
> 1 37231 0 4 2
> 2 37201 0 1 0
> .....
>
> I can't make egen count() work for me (or really anything else).
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/