Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Survival analysis question
From
Steven Samuels <[email protected]>
To
[email protected]
Subject
Re: st: Survival analysis question
Date
Thu, 4 Nov 2010 15:25:32 -0400
-
Al-
I think that the appropriate model is the "conditional risk set model
(time from the previous event) of Section 3.2.4 at http://www.stata.com/support/faqs/stat/stmfail.html#cond2
. The -stset- for that model excludes the -id()- option. If you
omit "id(id)" from your -stset- statement, you get the correct total
analysis time for the first two ids.. For all eight subjects, the
results of -stset- followed by -stsum-, -stcox- and -stcurve- are the
identical for:
stset time, failure(fail) exit(time .) time0(time0) //the formula
of sec 3.2.4 in the FAQ
and
stset time, failure(fail) //my original suggestion
Steve
For the -stcox- analysis of
also gives the correct total failure time and
On Nov 4, 2010, at 1:23 PM, Feiveson, Alan H. (JSC-SK311) wrote:
Hi Steve - OK - So I tried what was suggested in the link. To make
this really simple I just did -stset- for the first two id's (with all
failures):
. gen time0=0
. list id treat post fail t ttrxt time0 if id<=2 ,sepby(id)
+------------------------------------------------+
| id treat post fail t ttrxt time0 |
|------------------------------------------------|
1. | 1 pre 0 1 169 169 0 |
2. | 1 post 1 1 141 310 0 |
|------------------------------------------------|
3. | 2 pre 0 1 114 114 0 |
4. | 2 post 1 1 84 198 0 |
+------------------------------------------------+
. stset t, id(id) failure(fail) exit(time .) enter(time0) if(id<=2)
id: id
failure event: fail != 0 & fail < .
obs. time interval: (t[_n-1], t]
enter on or after: time time0
exit on or before: time .
if: id<=2
------------------------------------------------------------------------------
16 total obs.
12 ignored per request (if(), etc.)
------------------------------------------------------------------------------
4 obs. remaining, representing
2 subjects
4 failures in multiple failure-per-subject data
283 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 169
But the total time at risk should be 169 + 141 + 114 + 84 = 508 (not
283). Note 283 = 169 + 114 is the sum of the "pre" failure times.
Now, I redefine my "time0" variable to be where the previous test left
off and use the cumulated time as the time variable:
. replace time0=ttrxt[_n-1] if post==1
(8 real changes made)
. stset ttrxt, id(id) failure(fail) exit(time .) enter(time0) if(id<=2)
id: id
failure event: fail != 0 & fail < .
obs. time interval: (ttrxt[_n-1], ttrxt]
enter on or after: time time0
exit on or before: time .
if: id<=2
------------------------------------------------------------------------------
16 total obs.
12 ignored per request (if(), etc.)
------------------------------------------------------------------------------
4 obs. remaining, representing
2 subjects
4 failures in multiple failure-per-subject data
508 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 310
and I get the correct total time at risk.
However, equivalently, I could do what I did before without the
"enter(time0)":
. stset ttrxt, id(id) failure(fail) exit(time .) if(id<=2)
id: id
failure event: fail != 0 & fail < .
obs. time interval: (ttrxt[_n-1], ttrxt]
exit on or before: time .
if: id<=2
------------------------------------------------------------------------------
16 total obs.
12 ignored per request (if(), etc.)
------------------------------------------------------------------------------
4 obs. remaining, representing
2 subjects
4 failures in multiple failure-per-subject data
508 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 310
and I still get the correct time at risk.
Am I missing something? Shouldn't the total time at risk just be the
sum of the "t's"?
Al
-----Original Message-----
From: [email protected] [mailto:[email protected]
] On Behalf Of Steven Samuels
Sent: Thursday, November 04, 2010 11:05 AM
To: [email protected]
Subject: Re: st: Survival analysis question
--
-Al and Chris:
I should correct a previous statement of mine. You do formally
multiple-failure data, with a not-at-risk gap between test dates. But
I think that the proper analysis is a "time from previous entry" as in http://www.stata.com/support/faqs/stat/stmfail.html#cond2
, Section 3.2.4. The approach there of putting the second test data
into a separate stratum won't work, because you want to compare the
first and second times.
Steve
Steve - I think there is a communication problem here. The event is
a subject reaching a state of presyncopy during an upright tilt.
Subjects are given the tilt test with Treatment 1 ("pre"), then one
week later they are given the test with Treatment 2 ("post").
Subjects aren't at risk during the week in between because they
aren't doing the tilt test. But I see there is no way you would know
this from the data alone. Therefore I would like to claim that in
effect "times" can be considered as building up consecutively. Does
this make sense?
Al
It doesn't make sense to me, Al. Assume that there was no treatment
(or that the treatments were the same). For the times to be considered
as "building up consecutively," an individual's inherent survival
curve for the second test would continue where the first curve left
off. The length of time between the two tests make this very
unlikely. Too many (unmeasured) factors that affect response will
differ between the tests. I think this would be true even if the tests
were separated by just a few hours, though here issues of treatment
order, carry-over, changed physiological state, and prior outcome
would also enter.
Put it another way: Suppose you were measuring an outcome that was not
censored. Wouldn't you do a standard paired-data analysis? Let's
happens if I do this, ignoring the censoring, and compare the results
to those from a clustered regression of the individual times.
. bys subjectid: gen diff = time[2] - time[1]
. preserve
. bys subjectid: keep if _n==1
(8 observations deleted)
. mean diff //paired analysis
Mean estimation Number of obs = 8
--------------------------------------------------------------
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
diff | -281.625 114.6071 -552.6277 -10.62231
--------------------------------------------------------------
. restore
reg time treatment, cluster(subjectid)
Linear regression Number of obs
= 16
[output skipped]
(Std. Err. adjusted for 8 clusters in
subjectid)
------------------------------------------------------------------------------
| Robust
time | Coef. Std. Err. t P>|t| [95% Conf.
Interval]
-------------
+----------------------------------------------------------------
treatment | -281.625 118.6296 -2.37 0.049 -562.1394
-1.110568
_cons | 491.25 133.6418 3.68 0.008 175.2374
807.2626
------------------------------------------------------------------------------
The point estimates are the same, and the standard errors are close.
(In fact, if you jackknife the clusters, the standard errors are
identical.) By analogy, clustered -stcox- on the individual times is
the way to go. The fact that you can't get sensible survival curves
for your approach just reinforces this conclusion.
Steve
-----Original Message-----
From: [email protected] [mailto:[email protected]
] On Behalf Of Steven Samuels
Sent: Wednesday, November 03, 2010 2:40 PM
To: [email protected]
Subject: Re: st: Survival analysis question
--
Al,
I don't think that the two times are consecutive: they are recorded as
seconds, but the the two observations on each subject were separated
by a week.
Steve
On Nov 3, 2010, at 2:50 PM, Feiveson, Alan H. (JSC-SK311) wrote:
Steve - In my opinion this is multiple failure data. Each subject is
subjected to two consecutive exposures, and a subject can "fail" on
none, either, or both of these tests. So the variable ttrxt at a given
observation is the total time that the particular subject has been at
risk up through that observation. Therefore I think the stset command
. stset ttrxt, id(id) failure(fail) exit(time .)
id: id
failure event: fail != 0 & fail < .
obs. time interval: (ttrxt[_n-1], ttrxt]
exit on or before: time .
------------------------------------------------------------------------------
16 total obs.
0 exclusions
------------------------------------------------------------------------------
16 obs. remaining, representing
8 subjects
13 failures in multiple failure-per-subject data
5607 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 1198
is correct. I agree that ideally, one should try a frailty model on
this data, but it doesn't work well with only 8 subjects.
Al Feiveson
-----Original Message-----
From: [email protected] [mailto:[email protected]
] On Behalf Of Steven Samuels
Sent: Wednesday, November 03, 2010 12:35 PM
To: [email protected]
Subject: Re: st: Survival analysis question
Chris Westby:
You don't have multiple-failure data, because the start time for the
two tests should be zero. The correct statement is:
stset t, failure(fail)
This will change the -stcox- results as well. Also try -stsum,
by(treatment)- after the two versions of -stset--. I suggest that you
consider the -shared- option in -stcox- to allow for the possibility
of person-specific baseline hazards. Note that eight subjects is
probably not enough for the standard errors to be reliable.
Steve
Steven J. Samuels
[email protected]
18 Cantine's Island
Saugerties NY 12477
USA
Voice: 845-246-0774
Fax: 206-202-4783
On Nov 3, 2010, at 8:35 AM, Westby, Christian Michael. (JSC-SK)[USRA]
wrote:
Dear Statalisters,
I am working on comparing survival times in one group of subjects
before and after treatment and am having a hard time with the "stset"
code.
Using the following data set where testing was separated by 1 week, t
is time of task before and after treatment (seconds) and ttrxt is time
calculated to prevent time from being treated as continuous and fail
is 0=completed, 1=not completed.
subjectid treatment fail t ttrxt
-----------------------------------------------------------------
1 pre failed 169 169
1 post failed 141 310
2 pre failed 114 114
2 post failed 84 198
3 pre failed 564 564
3 post failed 296 860
4 pre failed 168 168
4 post failed 332 500
5 pre failed 215 215
5 post failed 50 265
6 pre completed 900 900
6 post failed 196 1096
7 pre completed 900 900
7 post failed 298 1198
8 pre completed 900 900
8 post failed 280 1180
-----------------------------------------------------------------
I used
. stset ttrxt, id(subjectid) failure(fail) exit(time .)
id: subjectid
failure event: fail != 0 & fail < .
obs. time interval: (ttrxt[_n-1], ttrxt] exit on or before: time .
------------------------------------------------------------------------------
16 total obs.
0 exclusions
------------------------------------------------------------------------------
16 obs. remaining, representing
8 subjects
13 failures in multiple failure-per-subject data
5607 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 1198
I then ran
. stcox treatment, cluster(subjectid)
failure _d: fail
analysis time _t: ttrxt
exit on or before: time .
id: subjectid
Iteration 0: log pseudolikelihood = -20.175132
Iteration 1: log pseudolikelihood = -18.079165
Iteration 2: log pseudolikelihood = -18.026011
Iteration 3: log pseudolikelihood = -18.025935
Refining estimates:
Iteration 0: log pseudolikelihood = -18.025935
Cox regression -- no ties
No. of subjects = 8 Number of obs
= 16
No. of failures = 13
Time at risk = 5607
Wald chi2(1)
= 4.22
Log pseudolikelihood = -18.025935 Prob > chi2
= 0.0399
(Std. Err. adjusted for 8 clusters in
subjectid)
------------------------------------------------------------------------------
| Robust
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------
-------------+------
treatment | 4.610013 3.428317 2.05 0.040 1.073226
19.80218
------------------------------------------------------------------------------
I believe that the output and results are accurate however, I am
unable to get Stata to correctly graph the survival curves using the
following code
. stcurv, surv at1(treatment=0) at2(treatment=1)
the resulting graph incorrectly plots both groups starting at less
than 100% at a time=0 and the x-axis scale is incorrect.
Any thoughts?
Chris
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/