The input data for the survival-analysis features are duration records: each observation records a span of time over which the subject was observed, along with an outcome at the end of the period. There can be one record per subject or, if covariates vary over time, multiple records.
You can obtain simple descriptions:
. webuse cancer (Patient survival in drug trial) . stset study died Survival-time data settings Failure event: died!=0 & died<. Observed time interval: (0, studytime] Exit on or before: failure
48 total observations |
0 exclusions |
48 observations remaining, representing |
31 failures in single-record/single-failure data |
744 total analysis time at risk and under observation |
At risk from t = 0 |
Earliest observed entry t = 0 |
Last observed exit t = 39 |
Per subject | |||||
Category Total Mean Min Median Max | |||||
Number of subjects 48 | |||||
Number of records 48 1 1 1 1 | |||||
Entry time (first) 0 0 0 0 | |||||
Exit time (final) 15.5 1 12.5 39 | |||||
Subjects with gap 0 | |||||
Time on gap 0 | |||||
Time at risk 744 15.5 1 12.5 39 | |||||
Failures 31 .6458333 0 1 1 | |||||
| ||||||||||
dose | Time at risk rate subjects 25% 50% 75% | |||||||||
Control | 180 .1055556 20 4 8 12 | |||||||||
5 mg | 209 .0287081 14 13 22 23 | |||||||||
10 mg | 355 .0169014 14 25 33 . | |||||||||
Total | 744 .0416667 48 8 17 33 |
You can also compare the survivor functions
. sts list, by(dose) compare Failure _d: died Analysis time _t: studytime Kaplan–Meier survivor function By variable: dose
dose Control 5 mg 10 mg |
Time 1 0.9000 1.0000 1.0000 |
5 0.6000 1.0000 1.0000 |
9 0.4500 0.8512 0.9286 |
13 0.2250 0.7448 0.8571 |
17 0.1125 0.6207 0.8571 |
21 0.1125 0.6207 0.8571 |
25 . 0.2069 0.6857 |
29 . 0.2069 0.5878 |
33 . . 0.4408 |
37 . . 0.4408 |
41 . . . |
or you can review the complete life table:
. sts list, by(dose) Failure _d: died Analysis time _t: studytime Kaplan–Meier survivor function By variable: dose
At Net Survivor Std. |
Time risk Fail lost function error [95% conf. int.] |
Control |
1 20 2 0 0.9000 0.0671 0.6560 0.9740 |
2 18 1 0 0.8500 0.0798 0.6038 0.9490 |
3 17 1 0 0.8000 0.0894 0.5511 0.9198 |
4 16 2 0 0.7000 0.1025 0.4505 0.8525 |
(output omitted) |
23 1 1 0 0.0000 . . . |
5 mg |
6 14 1 1 0.9286 0.0688 0.5908 0.9896 |
7 12 1 0 0.8512 0.0973 0.5234 0.9607 |
9 11 0 1 0.8512 0.0973 0.5234 0.9607 |
10 10 0 1 0.8512 0.0973 0.5234 0.9607 |
(output omitted) |
32 1 0 1 0.2069 0.1769 0.0104 0.5804 |
10 mg |
6 14 1 0 0.9286 0.0688 0.5908 0.9896 |
10 13 1 0 0.8571 0.0935 0.5394 0.9622 |
17 12 0 1 0.8571 0.0935 0.5394 0.9622 |
19 11 0 1 0.8571 0.0935 0.5394 0.9622 |
(output omitted) |
39 1 0 1 0.4408 0.1673 0.1312 0.7187 |
Just as easily, you can obtain a graph
. sts graph, by(dose)
or test the equality of the survivor functions:
. sts test dose Failure _d: died Analysis time _t: studytime Equality of survivor functions Log-rank test
Observed Expected | ||
dose | events events | |
Control | 19 7.25 | |
5 mg | 6 8.20 | |
10 mg | 6 15.56 | |
Total | 31 31.00 |
We used the log-rank test, but we could have specified the Wilcoxon–Breslow–Gehan test, the Tarone–Ware test, the Peto–Peto–Prentice test, or the Fleming–Harrington test.
We could also perform stratified versions of these tests to control for an external covariate:
. generate agecat = 1 . replace agecat = 2 if age > 55 (25 real changes made) . replace agecat = 3 if age > 60 (11 real changes made) . tabulate agecat
agecat | Freq. Percent Cum. | |
1 | 23 47.92 47.92 | |
2 | 14 29.17 77.08 | |
3 | 11 22.92 100.00 | |
Total | 48 100.00 |
Observed Expected | ||
dose | events events* | |
Control | 19 7.37 | |
5 mg | 6 9.67 | |
10 mg | 6 13.95 | |
Total | 31 31.00 |
Stata can fit Cox proportional hazards, exponential, Weibull, Gompertz, lognormal, log-logistic, and gamma models. In the case of the Cox proportional hazards model,
simple and stratified estimates are available
right censoring, left truncation (delayed entry), intermediary gaps are allowed
conventional and robust estimates of variance are available (Lin and Wei 1989)
The same is true of the parametric models. For exponential and Weibull models, estimates are available in either the accelerated-time or hazard metric.
Here we will focus on the Cox proportional hazards model using a model fitted on our dose–age data that we described above:
. stcox age i.dose Failure _d: died Analysis time _t: studytime Iteration 0: log likelihood = -99.911448 Iteration 1: log likelihood = -82.331523 Iteration 2: log likelihood = -81.676487 Iteration 3: log likelihood = -81.652584 Iteration 4: log likelihood = -81.652567 Refining estimates: Iteration 0: log likelihood = -81.652567 Cox regression with Breslow method for ties No. of subjects = 48 Number of obs = 48 No. of failures = 31 Time at risk = 744 LR chi2(3) = 36.52 Log likelihood = -81.652567 Prob > chi2 = 0.0000
_t | Haz. ratio Std. err. z P>|z| [95% conf. interval] | |
age | 1.118334 .0409074 3.06 0.002 1.040963 1.201455 | |
dose | ||
5 mg | .1805839 .0892742 -3.46 0.001 .0685292 .4758636 | |
10 mg | .0520066 .034103 -4.51 0.000 .0143843 .1880305 | |
By default, stcox uses Breslow’s method for handling ties and presents results as hazard ratios—that is, exponentiated coefficients—but we can see the underlying coefficients if we wish:
. stcox, nohr Cox regression with Breslow method for ties No. of subjects = 48 Number of obs = 48 No. of failures = 31 Time at risk = 744 LR chi2(3) = 36.52 Log likelihood = -81.652567 Prob > chi2 = 0.0000
_t | Coefficient Std. err. z P>|z| [95% conf. interval] | |
age | .11184 .0365789 3.06 0.002 .0401467 .1835333 | |
dose | ||
5 mg | -1.71156 .4943639 -3.46 0.001 -2.680495 -.7426241 | |
10 mg | -2.956384 .6557432 -4.51 0.000 -4.241617 -1.671151 | |
stcox can also handle ties using Efron’s method, the exact partial-likelihood method, or the exact marginal-likelihood method.
We can as easily fit the model with robust estimates of variance:
. stcox age i.dose, vce(robust) Failure _d: died Analysis time _t: studytime Iteration 0: log pseudolikelihood = -99.911448 Iteration 1: log pseudolikelihood = -82.331523 Iteration 2: log pseudolikelihood = -81.676487 Iteration 3: log pseudolikelihood = -81.652584 Iteration 4: log pseudolikelihood = -81.652567 Refining estimates: Iteration 0: log pseudolikelihood = -81.652567 Cox regression with Breslow method for ties No. of subjects = 48 Number of obs = 48 No. of failures = 31 Time at risk = 744 Wald chi2(3) = 32.39 Log pseudolikelihood = -81.652567 Prob > chi2 = 0.0000
Robust | ||
_t | Haz. ratio std. err. z P>|z| [95% conf. interval] | |
age | 1.118334 .0327643 3.82 0.000 1.055926 1.18443 | |
dose | ||
5 mg | .1805839 .0773571 -4.00 0.000 .0779917 .4181288 | |
10 mg | .0520066 .0349232 -4.40 0.000 .0139465 .1939333 | |
By using predict after stcox, we can obtain the following:
baseline hazard (and stratified baseline hazard if estimates are stratified)
cumulative baseline hazard
baseline survivor function (and stratified baseline survivor function if estimates are stratified)
martingale residuals
Cox–Snell residuals
deviance residuals
likelihood displacement values
LMAX measures of influence
log-frailties
efficient score residuals
DFBETA measures of influence
Schoenfeld residuals
scaled Schoenfeld residuals
The data used above have censored observations but no time-varying covariates and no left truncation. The failure event—death—occurs only once. Had the data included time-varying covariates, left truncation, or recurring failure events, however, nothing would have changed in terms of what we type, and importantly, all the same features are available postestimation regardless of the characteristics of the data.
We can also fit Cox (as well as parametric) models with random effects.
Known as
. webuse catheter, clear (Kidney data, McGilchrist and Aisbett, Biometrics, 1991) . stset time infect Survival-time data settings Failure event: infect!=0 & infect<. Observed time interval: (0, time] Exit on or before: failure
76 total observations |
0 exclusions |
76 observations remaining, representing |
58 failures in single-record/single-failure data |
7,424 total analysis time at risk and under observation |
At risk from t = 0 |
Earliest observed entry t = 0 |
Last observed exit t = 562 |
_t | Haz. ratio Std. err. z P>|z| [95% conf. interval] | |
age | 1.006202 .0120965 0.51 0.607 .9827701 1.030192 | |
female | .2068678 .095708 -3.41 0.001 .0835376 .5122756 | |
theta | .4754497 .2673108 | |
Lin, D. Y. and L. J. Wei. 1989. The robust inference of the Cox proportional hazards model. Journal of the American Statistical Association 84: 1074–1078.