On Monday, [email protected] wrote:
> I am working with a multi-record-per-case observational data set and have a
> problem. Specifically, it turns out that within cases, duration times for
> single records can span more than one year. Accordingly time-varying
> explanatory variables that carry different information by year are not
> accurate for that record. For example,
> ID Entry Exit pwom
> 4 01/14/82 07/08/86 50.2
> 4 07/08/86 12/31/86 52.6
> 4 01/01/87 04/30/87 57.5
> As you see, pwom varies for records 2 and 3 but not so for record 1, which
> covers a period of nearly 4 yrs but nonetheless carries a constant value
> for pwom. I cannot not locate a procedure in Stata for splitting record 1.
> Any suggestions would be greatly appreciated. Thanks.
Spliting on calendar can be tricky, at least, I have found it so.
Here is my idea of a way to do it.
First, let's read the the data in:
clear
input ID str8 Entry str8 Exit pwom
4 "01/14/82" "07/08/86" 50.2
4 "07/08/86" "12/31/86" 52.6
4 "01/01/87" "04/30/87" 57.5
end
gen entry=date(Entry,"mdy", 2050)
format entry %td
gen exit=date(Exit,"mdy", 2050)
format exit %td
drop Entry Exit
list
Since I am going to use -stsplit- I need to -stset- the data with
an id variable, and that requires me to have a failure variable.
Since we don't have one in this sample dataset, I will make
up a fake one:
gen fail=1
stset exit, id(ID) fail(fail) origin(time entry) exit(time .)
list
I have used the option -origin(time entry)- since I am assuming that
risk begins at the data in the variable entry. I have used
the option -exit(time .)- since I don't want the subject to exit
the study until the end of the time in the last record (so, this
allows the analysis to be multiple failure).
If we try to split now, it is be counting from the time in _t0 and _t.
So what date does the 0 in _t0 refer to? 14jan1982. But we don't want
it to count from jan 14th. We want it to count from jan 1st to jan 1st
each year.
So, what I will do, is -stset- in a different way, so that the
_t0 and _t are counting from a date in the calendar, 10jan1960.
For that I use -enter()- rather than -origin()-, because this counts
all times from 0, which is 10jan1960 is Stata's dat format.
qui stset exit, id(ID) fail(fail) enter(entry) exit(time .)
list
To get the spliting to occur in a particular year, say jan 1st, 1983,
I need to know what number in the date format that corresponds to.
Since we need to split at each january 1st, I need to find the numbers
for a few of those:
di date("01/01/1983", "mdy")
di date("01/01/1984", "mdy")
di date("01/01/1985", "mdy")
di date("01/01/1986", "mdy")
di date("01/01/1987", "mdy")
Now I know where to split:
stsplit year, at(8401 8766 9132 9497 9862)
replace year=1983 if yrgroup ==8401
replace year=1984 if yrgroup ==8766
replace year=1985 if yrgroup ==9132
replace year=1986 if yrgroup ==9497
replace year=1987 if yrgroup ==9862
Now I just need to re-stset-, so that risk will begin at
entry not jan 1st, 1960:
replace exit=_t
replace entry=_t0
stset exit, id(ID) fail(fail) origin(time entry) exit(time .)
list
A useful check is to be sure that you have exactly the same information
from the -stset- at the end of all this, as you had at the start.
Even though you have more records, you should still have the same amount
of analysis time.
-- May
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/