James Harris wrote:
> I'm running some simple survival analyses using
>
> sts list, failure at (1 12) by (gender ageband)
>
> My question is, can I access the results to use later in my do-file?
> For example, how would I pick out failure rate at time 12, and put it
> into a variable or matrix?
>
> Many thanks.
I'm replying to my own question, as all I got from StataList was "try
sts generate". I had a useful correspondance with someone who'd asked
StataList this question earlier, and then in turn have been contacted by
others who are dealing with the same problem.
The solution I've found works in some - but not all - cases. If anyone
is interested in generalising it, feel free!
- James.
The first method uses -tsset- to create the missing observations; then
carries forward the survival rates:
I'm sorry to say that I did not [get an answer from StataList],
which is unfortunate. I can tell you
how I addressed the problem since I had to get an answer .. I figured
that, since the failure function is not a model and is essentially
calculated at each particular failure time, I assumed that the failure
function didn't change between successive timepoints. So, essentially,
if I had a time 2 with failure rate A, and at the next timepoint 8 I
have A+x, I assume my failure rate is A for time 3,4,5,6, and 7. It is
not until timepoint 8 that I had the failure change. Likewise, if all
my obs for one group were censored at time 10, and for my other group
they were censored at time 22, the best estimate for my first group's
failure function remains constant between time 10 and 22.
Whether or not this line of reasoning applies to your problem, I can't
say. But that's what I did, because having talked with a few of my
colleagues it seemed reasonable in my case.
To implement this, I did a tsset and tsfill (to create an observation
for all missing timepoints
3-7, and filled the data accordingly, with a line something like the
following,
. bysort [grouping-variables] (_t): replace failure=failure[_n-1] if
missing(failure) & ~missing(failure[_n-1])
The tsset method almost worked for me, except that it creates too many
observations: I am working with a million or so hospitalisation records,
that span 15 years, with a timescale of 1 day, and my aged computer
simply ran out of memory. But the basic idea of carrying forward the
survival rates to dummy records seemed like the way to go.
So instead I created a file of observations that holds 1 record for each
person for each time that I want the survival rate for:
// set up the demographics file, with entries at the times of interest
use ".\stata data\temp\demographics", clear
// a file with a record for each unique person (identified by mast_enc)
describe
local BridgeDuration = 30
local DH_end = 366
local Yr2_end = 732 // two years
gen time1 = `BridgeDuration' // days
gen time2 = `DH_end'
gen time3 = `Yr2_end'
reshape long time, i(mast_enc) j(break)
compress
describe
save ".\stata data\temp\demographics30", replace
After the stset on my main dataset, I generate the survival parameter,
and the same time variable as in my 'demographics' file
sts generate survival = s
format survival %6.4g
gen time = _t
I append the 'demographics' file to my main dataset, sort by time, then
carryforward the survival rates:
append using ".\stata data\temp\demographics30"
sort mast_enc time _t // the sort by _t ensures that eg 30.x days goes
after 30.0 days
// mast_enc is the unique person ID
// at 30 days, use earlier survival if there is one for this person
// also carry forward the _st flag
// and the ageband
// unless previous record is also for a t=30
foreach BreakPoint in `BridgeDuration' `DH_end' `Yr2_end' /// the three
breakpoints: end of bridge; rest of 1st year; second year
{
bysort mast_enc (time): replace _st = _st[_n-1] ///
if (missing(_st) & (time == `BreakPoint') & (time[_n-1] ~=
`BreakPoint') ///
& (mast_enc == mast_enc[_n-1]))
bysort mast_enc (time): replace survival = survival[_n-1] ///
if ( missing(survival) & (time == `BreakPoint') & (time[_n-1]
~= `BreakPoint') ///
& (mast_enc == mast_enc[_n-1]))
//bysort mast_enc (time): replace ageband = ageband[_n-1] ///
// if ( (time == `BreakPoint') & (time[_n-1] ~= `BreakPoint') &
(mast_enc == mast_enc[_n-1]))
}
Each person in the dataset now has a survival rate at the three
specified times. It can be tabluated with something like
foreach BreakPoint in `BridgeDuration' `DH_end' `Yr2_end' {
table gender ageband if time == `BreakPoint', contents (min
survival freq)
}
if survival is decreasing over time. The 'min' function is needed if
some people actually have an event at one of my three times of
interest. Alternatively, you can create a dataset of the survival
values with something like
collapse (min) survival if inlist(time, `BridgeDuration' ,`DH_end'
,`Yr2_end') , by (ageband gender time )
The good news is that this method works well when each person only
becomes at risk once - in my case, a chance of death after a heart attack.
The bad news is that it fails dismally when each person may have
multiple entries and failure events - eg when I want the chance of a
heart attack given a previous heart attack. I've given up on solving
that version - if anyone has any suggestions I'd be very grateful.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/