In the spotlight: Interval-censored survival data—model fitting and beyond
What are interval-censored data?
Survival data often contain censored observations for which time to an event of interest is not observed exactly. Censored observations can be right-censored, left-censored, or interval-censored. An observation is right-censored if we know that the event of interest happened after the observed time. It is left-censored if we know that the event happened before the observed time. It is interval-censored if we know only that the event happened within some observed time interval. The term interval-censored data is used in general to refer to data that might be right-censored, left-censored, or interval-censored.
Interval-censored survival data arise in many areas, including medical, epidemiological, financial, and sociological studies. A common example is a clinical trial where patients are tested or measured periodically to evaluate if the event of interest has happened. We may not observe the exact time of the event, but we know that it happened before an evaluation, after an evaluation, or between two evaluations. The same applies to many other examples, such as unemployment duration in economic data, time of weaning in demographic data, or time to obesity in epidemiological data. Ignoring interval-censoring may lead to biased estimates.
In Stata, we can fit parametric models to interval-censored survival-time data using the stintreg command. stintreg supports different distributions and parameterizations, as well as the modeling of ancillary parameters and stratification. The command can analyze data that include all types of censoring, and it can also analyze current status data in which the event of interest is known to occur only before or after an observed time.
Fit a model
We want to study the effect of two breast cancer treatments (treat) on breast retraction, which is a cosmetic deterioration for some breast cancer patients. Those patients were treated with either radiotherapy alone or radiotherapy plus adjuvant chemotherapy. The breast retraction was measured at each follow-up visit to the doctor, which occured at different times for different patients. The exact times of breast retraction are not observed, but they are known to fall in time intervals with the left and right bounds recorded in variables ltime and rtime.
We fit a Weibull model of time to breast retraction as a function of treatment using stintreg.
. stintreg i.treat, interval(ltime rtime) distribution(weibull) Weibull PH regression Number of obs = 94 Uncensored = 0 Left-censored = 5 Right-censored = 38 Interval-cens. = 51 LR chi2(1) = 10.93 Log likelihood = -143.19228 Prob > chi2 = 0.0009
Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] | |||
treat | |||
Radio+Chemo | 2.498526 .7069467 3.24 0.001 1.434961 4.350383 | ||
_cons | .0018503 .0013452 -8.66 0.000 .000445 .007693 | ||
/ln_p | .4785787 .1198973 3.99 0.000 .2435843 .713573 | ||
p | 1.613779 .1934877 1.275814 2.041272 | ||
1/p | .6196635 .074296 .4898907 .7838134 | ||
We find that the hazard of breast retraction for patients with radiotherapy plus chemotherapy is 2.5 times larger than the hazard for patients with radiotherapy alone. We can now evaluate whether this model fits well and further explore the results.
Use diagnostic tools
stintreg provides two types of residuals to visually assess the appropriateness of the fitted models.
We can use predict with option mgale to obtain the Martingale-like residuals and to visually check whether the patient’s age (age) should be included in our model by producing a scatterplot of the Martingale-like residuals versus age.
. predict mg, mgale . scatter mg age
The scatterplot does not show any systematic trend, indicating that age is not needed in the model. We can produce scatterplots of mg against other variables of interest to identify potential omitted predictors.
To assess the goodness of fit of the model visually, we use the estat gofplot command, which plots the Cox–Snell residuals versus the estimated cumulative hazard function corresponding to these residuals. If the model fits the data well, the plotted estimated cumulative hazards should be close to the reference line, which is formed by the Cox–Snell residuals.
. estat gofplot, title("Interval-censored Weibull regression")
We can also visually compare our original Weibull model with an exponential model. We fit the model using exponential distribution and obtain the goodness-of-fit plot.
. quietly stintreg i.treat, interval(ltime rtime) distribution(exponential) . estat gofplot, title("Interval-censored exponential regression")
Comparing the above two plots produced by estat gofplot, we conclude that the model with Weibull distribution fits the data better than the model with exponential distribution.
Not having found any evidence against the Weibull model, we refit the Weibull model and see what else it tells us.
Interpret and visualize results
We have many tools available for interpreting and visualizing results.
We use predict to obtain the expected median survival time for both treatments. Then, we tabulate the results to compare the two types of treatment:
. quietly stintreg i.treat, interval(ltime rtime) distribution(weibull) . predict m, median time . tabulate treat, summarize(m)
Summary of Predicted median for | ||
(ltime,rtime] | ||
Treatment | Mean Std. Dev. Freq. | |
Radio | 39.332397 0 46 | |
Radio+Che | 22.300791 0 48 | |
Total | 30.635407 8.5595267 94 |
Expected median time to breast retraction is longer for the radiotherapy-only group than for the group that also received chemotherapy.
We can use margins to obtain confidence intervals for those values:
. margins treat, predict(median time) Adjusted predictions Number of obs = 94 Model VCE : OIM Expression : Predicted median for (ltime,rtime], predict(median time)
Delta-method | ||
Margin Std. Err. z P>|z| [95% Conf. Interval] | ||
treat | ||
Radio | 39.3324 5.342493 7.36 0.000 28.8613 49.80349 | |
Radio+Chemo | 22.30079 2.436642 9.15 0.000 17.52506 27.07652 | |
Next, we compare the average patient’s survival curve under radiotherapy only (treat = 0) and under radiotherapy plus chemotherapy (treat = 1). We can plot the survival functions for both treatments using the stcurve command:
. stcurve, survival at1(treat = 0) at2(treat = 1)
From the above survival function plot, we see that the risk of developing breast retraction for an average patient in the radiotherapy-plus-chemotherapy treatment group is higher than that for the same patient in the radiotherapy-only treatment group. In other words, the adjuvant chemotherapy increases the risk of breast retraction.
Prefer to point and click instead of typing commands? No worries. All of stintreg's features can also be accessed using Stata's menu and dialog box.
This example only touches on the types of models and analyses available for interval-censored survival-time data. See [ST] stintreg to learn more.
— Xiao Yang
Senior Statistician and Software Developer