GOF plots for survival models

Goodness-of-fit plots for survival models

Order

<- See Stata 18's new features

Highlights

Parametric and semiparametric survival models
Right-censored and interval-censored data
Three estimators for the cumulative hazard function
By-group and stratified models
See more survival analysis features

Stata 18 provides the new estat gofplot command to produce goodness-of-fit (GOF) plots for survival models. You can use it after five survival models: right-censored Cox (stcox), interval-censored Cox (stintcox), right-censored parametric (streg), interval-censored parametric (stintreg), and marginal Cox model for interval-censored multiple-event data (stmgintcox). Check model fit after stratified models or separately for each by-group.

GOF plots provide visual checks for how well the model fits the data. In survival analysis, these checks are based on so-called Cox–Snell residuals and the assumption that, if a model is correct, these residuals should have a standard exponential distribution. Visually, this assumption is assessed by plotting the residuals against their estimated cumulative hazard—the closer the plotted values are to the 45° line, the better the fit (Cox and Snell 1968).

Let's see it work

-> GOF plots for right-censored data

-> GOF plots for interval-censored data

-> GOF plots for interval-censored multiple-event data

GOF plots for right-censored data

We use the dataset on 103 patients admitted to the Stanford Heart Transplantation Program (Crowley and Hu 1977). The dataset includes the year the patient was accepted into the program (year), the patient's age (age), whether the patient had other heart surgery previously (surgery), and whether the patient received a transplant (posttran). We wish to analyze time until death and check whether our model fits the data well. We fit a Cox model first by typing

. stcox age posttran surgery year, nolog 

        Failure _d: died
  Analysis time _t: t1
       ID variable: id

Cox regression with Breslow method for ties

No. of subjects =      103                              Number of obs =    172
No. of failures =       75
Time at risk    = 31,938.1
                                                        LR chi2(4)    =  17.56
Log likelihood = -289.53378                             Prob > chi2   = 0.0015



          _t   Haz. ratio   Std. err.      z    P>|z|     [95% conf. interval]

        age      1.030224   .0143201     2.14   0.032     1.002536    1.058677
   posttran      .9787243   .3032597    -0.07   0.945     .5332291    1.796416
    surgery      .3738278    .163204    -2.25   0.024     .1588759       .8796
       year      .8873107    .059808    -1.77   0.076     .7775022    1.012628

We then use estat gofplot to visually explore the model's GOF.

. estat gofplot

Comparing the blue line with the black reference line, we conclude that our Cox model fits the data well.

For right-censored data, instead of the default Nelson–Aalen estimator (Nelson 1972; Aalen 1978), we can use option km to use the alternative minus log of the Kaplan–Meier estimator (Kaplan and Meier 1958).

Let's now fit a stratified Cox model, which assumes that the baseline hazard functions are different between patients from different groups (pgroup) but the coefficients are equal across those groups.

. stcox age posttran surg year, strata(pgroup) nolog

        Failure _d: died
  Analysis time _t: t1
       ID variable: id

Stratified Cox regression with Breslow method for ties
Strata variable: pgroup

No. of subjects =      103                              Number of obs =    172
No. of failures =       75
Time at risk    = 31,938.1
                                                        LR chi2(4)    =  20.67
Log likelihood = -213.35033                             Prob > chi2   = 0.0004



          _t   Haz. ratio   Std. err.      z    P>|z|     [95% conf. interval]

        age      1.027406   .0150188     1.85   0.064     .9983874    1.057268
   posttran      1.075476   .3354669     0.23   0.816      .583567    1.982034
    surgery      .2222415   .1218386    -2.74   0.006     .0758882    .6508429
       year      .5523966   .1132688    -2.89   0.004     .3695832     .825638

After our stratified model, we use option stratify with estat gofplot to produce a separate plot for each stratum of pgroup.

. estat gofplot, stratify

The model fits data well in all strata. The red line for pgroup = 2 deviates from the reference line toward the end. This is not uncommon to see in practice because fewer observations are available for estimation toward the end of the study.

To aid visual inspection of the plot, we can also add option separate to produce separate graphs for each stratum.

. estat gofplot, stratify separate

GOF plots for interval-censored data

We use the dataset of a study for early breast cancer patients (Finkelstein and Wolfe 1985) that compares the cosmetic effects of two cancer treatments (treat) on breast retraction. Because patients were observed at random follow-up times, the exact time of breast retraction was not observed and was known only to fall in the interval between visits (variables ltime and rtime). First, we fit an interval-censored Weibull model of time to breast retraction on treatment using stintreg:

. stintreg i.treat, interval(ltime rtime) distribution(weibull) nolog

Weibull PH regression                               Number of obs     =     94
                                                           Uncensored =      0
                                                        Left-censored =      5
                                                       Right-censored =     38
                                                       Interval-cens. =     51

                                                    LR chi2(1)        =  10.93
Log likelihood = -143.19228                         Prob > chi2       = 0.0009



               Haz. ratio   Std. err.      z    P>|z|     [95% conf. interval]

      treat                                                                   
Radio+Chemo      2.498526   .7069467     3.24   0.001     1.434961    4.350383
      _cons      .0018503   .0013452    -8.66   0.000      .000445     .007693

      /ln_p      .4785786   .1198972     3.99   0.000     .2435844    .7135729

          p      1.613779   .1934876                      1.275814    2.041271
        l/p      .6196635   .0742959                      .4898907    .7838134

Note: _cons estimates baseline hazard.

We then use estat gofplot to produce the GOF plot.

. estat gofplot

With interval-censored data, Cox–Snell-like residuals are defined and used for plotting (Farrington 2000). If a model fits the data well, these residuals should approximate the censored standard exponential distribution. Also, the nonparametric Turnbull estimator (Turnbull 1976) is used to estimate the cumulative hazard.

The jagged line stays close to the reference line in the above graph, which indicates that the Weibull model fits the data well.

Suppose that we now want to fit an exponential model and check its model fit. We type

. quietly stintreg i.treat, interval(ltime rtime) distribution(exponential)
. estat gofplot

Comparing this GOF plot with the one above, we can see that the Weibull model fits our data better than the exponential model.

GOF plots for interval-censored multiple-event data

We can also use estat gofplot to visually assess the overall model fit for interval-censored multiple-event data. We use a simulated dataset based on the ARIC study described in Xu, Zeng, and Lin (2023). The participants were followed over time and assessed for two diseases (diabetes and hypertension) during several follow-up examinations. The investigators are interested in the factors that influence the times to onset for those two diseases. The factors of interest include three demographic variables—race, male, community—and five baseline risk factors: age, bmi, glucose, sysbp, and diabp.

We first fit a marginal Cox proportional hazards model using stmgintcox:

. webuse aric
(Simulated ARIC data)

. stmgintcox age i.male i.community i.race bmi glucose sysbp diabp, id(id) 
     event (event) interval(ltime rtime) nolog favorspeed

note: using fixed step size with a multiplier of 5 to compute derivatives.
note: using EM and VCE tolerances of 0.0001.
note: option noemhsgtolerance assumed.

Marginal interval-censored Cox regression          Number of events   =      2
Baseline hazard: Reduced intervals                 Number of subjects =    200
                                                   Number of obs      =    400
ID variable: id                                            Uncensored =      0
Event variable: event                                   Left-censored =     47
Event-time interval:                                   Right-censored =    240
  Lower endpoint: ltime                                Interval-cens. =    113
  Upper endpoint: rtime
                                                   Wald chi2(20)      =  84.36
Log pseudolikelihood = -270.83984                  Prob > chi2        = 0.0000



                             Robust                                           
               Haz. ratio   std. err.      z    P>|z|     [95% conf. interval]
   
Diabetes                                                                      
         age     .9552606   .0295589    -1.48   0.139     .8990481    1.014988
                                                                              
        male                                                                  
        Yes      .8084224   .2400335    -0.72   0.474      .451755    1.446684
                                                                              
   community                                                                  
    Jackson      1.597828   .6069935     1.23   0.217     .7588748    3.364265
Minneapolis      1.028054    .342976     0.08   0.934     .5346148    1.976929
 Washington      1.407869   .5192024     0.93   0.354     .6833627    2.900504
                                                                              
        race                                                                  
      White      .4289702   .1273669    -2.85   0.004     .2397145    .7676444
         bmi     1.116579    .034187     3.60   0.000     1.051545    1.185636
     glucose     1.139753   .0303702     4.91   0.000     1.081756    1.200859
       sysbp     1.020295   .0122308     1.68   0.094     .9966021     1.04455
       diabp     .9928634   .0127512    -0.56   0.577     .9681835    1.018172
   
Hypertension                                                                  
         age     .9950085   .0225503    -0.22   0.825     .9517779    1.040203
                                                                              
        male                                                                  
        Yes      .6671401   .1599892    -1.69   0.091     .4169533    1.067448
                                                                              
   community                                                                  
    Jackson      .6085406   .1953944    -1.55   0.122     .3243246    1.141824
Minneapolis      .9040647   .2719638    -0.34   0.737     .5013468    1.630275
 Washington       .674088   .2085739    -1.27   0.202     .3675707     1.23621
                                                                              
        race                                                                  
      White      1.261355    .425064     0.69   0.491     .6516152    2.441652
         bmi     1.012196   .0195117     0.63   0.529     .9746672     1.05117
     glucose      .989899   .0101396    -0.99   0.322     .9702238    1.009973
       sysbp     1.075011   .0162901     4.77   0.000     1.043553    1.107418
       diabp     1.025533   .0134835     1.92   0.055     .9994433    1.052303

Note: Standard error estimates may be more variable for small datasets and
      datasets with low proportions of interval-censored observations.

Now, let's produce the goodness-of-fit plots for all events. By default, estat gofplot creates a single graph with subgraphs for each event.

. estat gofplot

You can add the sepevents option to request that the plot for each event be placed on a separate graph.

. estat gofplot, sepevents

If we want to examine the goodness-of-fit plots for diabetes across different communities, we can use the by(community) option along with events("Diabetes"). The estat gofplot command will display these community-specific plots for diabetes overlaid on a single graph.

. estat gofplot, events("Diabetes") by(community)

We can also add the separate option to produce separate subgraphs for each community.

. estat gofplot, events("Diabetes") by(community) separate

References

Aalen, O. O. 1978. Nonparametric inference for a family of counting processes. Annals of Statistics 6: 701–726. https://doi.org/10.1214/aos/1176344247.

The ARIC investigators. 1989. The Atherosclerosis Risk in Communities (ARIC) study: Design and objectives. American Journal of Epidemiology 129: 687–702. https://doi.org/10.1093/oxfordjournals.aje.a115184.

Cox, D. R., and E. J. Snell. 1968. A general definition of residuals (with discussion). Journal of the Royal Statistical Society, Series B 30: 248–275.

Crowley, J., and M. Hu. 1977. Covariance analysis of heart transplant survival data. Journal of the American Statistical Association 72: 27–36.

Farrington, C. P. 2000. Residuals for proportional hazards models with interval-censored survival data. Biometrics 56: 473–482.

Finkelstein, D. M., and R. A. Wolfe. 1985. A semiparametric model for regression analysis of interval-censored failure time data. Biometrics 41: 933–945.

Kaplan, E. L., and P. Meier. 1958. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association 53: 457–481.

Nelson, W. 1972. Theory and applications of hazard plotting for censored failure data. Technometrics 14: 945–966.

Turnbull, B. W. 1976. The empirical distribution function with arbitrarily grouped, censored and truncated data. Journal of the Royal Statistical Society, Series B 38: 290–295.

Xu, Y., D. Zeng, and D. Lin. 2023. Marginal proportional hazards models for multivariate interval-censored data. Biometrika 110: 815–830.

Tell me more

Learn more about other new features in survival analysis.

View all the new features in Stata 18.

Made for data science.

Get started today.

Order

Upgrade

Goodness-of-fit plots for survival models

Let's see it work

GOF plots for right-censored data

GOF plots for interval-censored data

GOF plots for interval-censored multiple-event data

References

Tell me more

Made for data science.

Get started today.

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies


_t		Haz. ratio Std. err. z P>\|z\| [95% conf. interval]

age		1.030224 .0143201 2.14 0.032 1.002536 1.058677
posttran		.9787243 .3032597 -0.07 0.945 .5332291 1.796416
surgery		.3738278 .163204 -2.25 0.024 .1588759 .8796
year		.8873107 .059808 -1.77 0.076 .7775022 1.012628


_t		Haz. ratio Std. err. z P>\|z\| [95% conf. interval]

age		1.027406 .0150188 1.85 0.064 .9983874 1.057268
posttran		1.075476 .3354669 0.23 0.816 .583567 1.982034
surgery		.2222415 .1218386 -2.74 0.006 .0758882 .6508429
year		.5523966 .1132688 -2.89 0.004 .3695832 .825638


		Haz. ratio Std. err. z P>\|z\| [95% conf. interval]

treat
Radio+Chemo		2.498526 .7069467 3.24 0.001 1.434961 4.350383
_cons		.0018503 .0013452 -8.66 0.000 .000445 .007693

/ln_p		.4785786 .1198972 3.99 0.000 .2435844 .7135729

p		1.613779 .1934876 1.275814 2.041271
l/p		.6196635 .0742959 .4898907 .7838134


		Robust
		Haz. ratio std. err. z P>\|z\| [95% conf. interval]

Diabetes
age		.9552606 .0295589 -1.48 0.139 .8990481 1.014988

male
Yes		.8084224 .2400335 -0.72 0.474 .451755 1.446684

community
Jackson		1.597828 .6069935 1.23 0.217 .7588748 3.364265
Minneapolis		1.028054 .342976 0.08 0.934 .5346148 1.976929
Washington		1.407869 .5192024 0.93 0.354 .6833627 2.900504

race
White		.4289702 .1273669 -2.85 0.004 .2397145 .7676444
bmi		1.116579 .034187 3.60 0.000 1.051545 1.185636
glucose		1.139753 .0303702 4.91 0.000 1.081756 1.200859
sysbp		1.020295 .0122308 1.68 0.094 .9966021 1.04455
diabp		.9928634 .0127512 -0.56 0.577 .9681835 1.018172

Hypertension
age		.9950085 .0225503 -0.22 0.825 .9517779 1.040203

male
Yes		.6671401 .1599892 -1.69 0.091 .4169533 1.067448

community
Jackson		.6085406 .1953944 -1.55 0.122 .3243246 1.141824
Minneapolis		.9040647 .2719638 -0.34 0.737 .5013468 1.630275
Washington		.674088 .2085739 -1.27 0.202 .3675707 1.23621

race
White		1.261355 .425064 0.69 0.491 .6516152 2.441652
bmi		1.012196 .0195117 0.63 0.529 .9746672 1.05117
glucose		.989899 .0101396 -0.99 0.322 .9702238 1.009973
sysbp		1.075011 .0162901 4.77 0.000 1.043553 1.107418
diabp		1.025533 .0134835 1.92 0.055 .9994433 1.052303