Matt's approach -drop-s only the last observation in each such spell
that is 3 long, and the second last in each such spell that is 4 long,
and so forth.
-tsspell- on SSC is a convenience command for such problems. Its help
file is detailed, with several worked examples, including problems
similar to Raphael's.
As its name implies, you must -tsset- your data before use. That is
painless:
. tsset id timevar
Now define spells as sequences of zeros:
. tsspell, cond(count == 0)
-tsspell- automatically respects the panel structure of your data. It
creates new variables with default names _spell, _seq, and _end. See the
help for explanation if these names are not obvious.
The length of each spell is returned like this:
. egen length = max(_seq), by(id _spell)
Now we are home and dry:
. drop if length >= 3
The variables _spell, _seq, _end could be -drop-ped if they were no
further use.
Alternatively, this article spells out the principles of doing it
yourself:
SJ-7-2 dm0029 . . . . . . . . . . . . . . Speaking Stata: Identifying
spells
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N.
J. Cox
Q2/07 SJ 7(2):249--265 (no
commands)
shows how to handle spells with complete control over
spell specification
-findit spell- would have pointed to these and other stuff.
Nick
[email protected]
Matt Spittal
One way of approaching this is to use a combination of the -bysort-
command and the explicit subscripting commands. For instance:
bysort id: gen nzero = 1 if count[_n - 2] == 0 & count[_n - 1]
== 0 & count == 0
should identify the observations where there are three consecutive zeros
for each person. (If it isn't quite what you want, a variation on this
will do the trick.) Then
drop if nzero == 1
will exclude these observations from the dataset. Alternatively,
something like
xtpoisson count if nzero != 1
(or whatever commands you are using) will keep all the observations in
the dataset, but exclude them from the analysis. A very good
description of subscripting within groups is given in the User's Guide
in section 13.7.2, for Stata version 10.
Raphael Fraser
I have longitudinal data with "id" as unique identifier, "timevar" as
the time variable and an outcome variable I call "count." The timevar
contains the elapsed time in minutes. I would like to exclude all
zeros where there are 3 or more consecutive zeros for each person. Can
anyone help?
id timevar count
1 1 56
1 2 2
1 3 0
1 4 0
1 5 0
1 6 0
1 7 5
1 8 0
1 9 0
2 1 230
2 2 0
2 3 0
2 4 19
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/