Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Carry forward an observation within a time frame
From
David Kantor <[email protected]>
To
[email protected]
Subject
Re: st: Carry forward an observation within a time frame
Date
Wed, 09 Oct 2013 00:15:44 -0400
At 09:08 PM 10/8/2013, Benigno Rodriguez wrote:
Dear Stata listers:
I have a panel dataset that consists of multiple visits on multiple
subjects at some of which a CD4 value is obtained. My objective is
to carry forward the CD4 value to visits for which it is missing,
but only if a value is available within the 4 months prior.
I recognize the problem as one of spells, and have read Nick Cox's
excellent article on spells from 2007, as well as his column on
lists from 2002 (which I recognize as even more relevant to my
problem), but despite his heroic efforts at a foolproof introduction
to for and its variants, programming does not seem to get through my
thick skull easily. Could I get a hand with this, ideally not
involving code? Below is a relevant excerpt of the dataset, with
the desired result in the last column.
Thank you very much in advance.
patid date CD4 desired
1007 5-May-55 . .
1007 1-Jan-00 . .
1007 3-Apr-02 5 5
1007 8-Apr-02 . 5
1007 11-Apr-02 . 5
1007 13-May-02 . 5
1007 14-May-02 4 4
1007 17-Jun-02 9 9
1007 12-Nov-02 . .
1007 27-Jan-03 6 6
1007 17-Mar-03 . 6
1007 14-Apr-03 0 0
There are two issues to be dealt with:
1: determining what is " 4 months prior";
2: carrying values forward.
For the first matter, you need to decide what is meant by " 4 months
prior". Is that 120 (or 121) days? Or is it in months that are
separated by no more than 4 (e.g.,April to August) regardless of the
day-of-month? Or is it within a span of four months, to the same
day-of-month (e.g., April 12 to August 12)?
The first option listed is easiest. you can generate a
date-difference variable:
by patid (date): gen int datediff = date-date[_n-1]
--then screen for datediff<=120 (or 121 or whatever).
For the second option,...
gen int m = mofd(date)
by patid (date): gen int mdiff = m = m[_n-1]
--then screen on mdiff <=4
For the third option, use the m and mdiff defined above, and...
gen byte d = day(date)
by patid (date): gen byte ddiff = d- d[_n-1]
-- then screen on the condition mdiff<4 | (mdiff==4 & ddiff<=0)
When I write "screen on", I mean to use it in filtering the carrying
step -- that is, use it as screening_condition in what follows.
To do the carrying, you can do a direct replace operation, or use
carryforward (from SSC):
1: direct replace:
by patid (date): replace CD4 = CD4[_n-1] if mi(CD4) & _n>1 &
screening_condition
2: carryforward:
by patid (date): carryforward CD4 if screening_condition, replace
----
Either of the carrying techniques can be modified to generate a
separate variable, rather than replacing the original.
The various operations and expressions that I've outlined to obtain
the screening_condition can be folded into a single expression,
avoiding the creation of intermediary variables (m,d, mdiff, ddiff,
datediff). But it may be easier to manage in the way I've outlined.
You may want to generate an indicator variable for that purpose.
(If you formulate an expression -- involving [_n-1] -- it can go into
the direct replace operation; it can't exactly go into the
carryforward, though it may be possible to get the same effect with
the dynamic_condition option. It is probably easiest to generate an
indicator variable.)
See -help dates- for an explanation of date() and mofd().
See -help carryforward- if you download that module.
HTH
--David
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/