Marginal Cox PH model for interval-censored multiple-event data

Marginal Cox PH model for interval-censored multiple-event data StataNow

StataNow

Order

<- See more new Stata features

Highlights

Marginal Cox proportional hazards model for interval-censored multiple-event data
Left-censoring, right-censoring, interval-censoring
Single-record and multiple-record data format
Support for time-varying covariates
Stratified estimation
Support for robust and cluster–robust standard errors
Testing proportional-hazards assumption
Powerful test for and estimation of a common covariate effect across all events
Predictions of baseline functions, martingale-like residuals, Cox–Snell-like residuals, and time-varying prediction
Graphs of survivor, cumulative hazard, and hazard functions
Graphical checks of proportional-hazards assumption
See more survival analysis features

Need to analyze event times from multiple types of events such as the onset of diabetes and hypertension? Don't know the exact event times? Use the new stmgintcox command to analyze such interval-censored multiple-event data and account for possible correlation between event times across the different events. Evaluate the proportional-hazards assumption. Perform a powerful test for a common covariate effect across all events. Graph covariate-specific survivor, hazard, and other functions. And more! This command is part of StataNow™.

Interval-censored multiple-event data commonly arise in longitudinal studies because each study subject may experience several types of events and those events are not observed directly but are known to occur within some time interval. This data type arises in many fields, including medicine, epidemiology, biology, and sociology. For example, an epidemiologist studying chronic diseases might collect data on patients with multiple conditions, such as heart disease and metabolic disease, during different doctor visits. Similarly, a sociologist might conduct surveys to record major life events, such as job changes and marriages, at regular intervals. In ecology, researchers might monitor reproductive cycles of animals, such as nesting and birthing, through periodic observations. In these studies, researchers are often interested in evaluating the effects of certain factors on the event times. However, analyzing interval-censored multiple-event data is challenging because none of the event times are exactly observed and the dependence structure between different event times is often unknown.

Marginal proportional hazards models can be used to analyze interval-censored multiple-event data. These models do not require modeling the dependence structure between different events, thus providing more robust inference. They also produce parameters that can be interpreted as population-average effects. Additionally, they are faster than their random-effects counterparts.

In Stata 17, we introduced the stintcox command to fit genuine semiparametric Cox models for univariate interval-censored event-time data. In Stata 18, we expanded the functionality of stintcox to support time-varying covariates (TVCs). The new stmgintcox command fits a marginal proportional hazards model to interval-censored multiple-event data. You can use this command with either single- or multiple-record-per-event data, and it supports TVCs for all events or specific ones. It also offers flexible ways to specify models with event-specific covariates. After fitting the model, you can estimate and test the average effect of a covariate across all event times using a more powerful test than the classic multivariate Wald test. You can also generate event-specific predictions, create plots of covariate-adjusted survivor and other functions, and produce goodness-of-fit plots after stmgintcox.

Let's see it work

-> Fitting a model for single-record-per-event interval-censored data

-> Fitting a model with event-specific covariates

-> Estimating and testing the average effect of a covariate across all events

-> Graphing survivor functions

-> Assessing overall model fit by goodness-of-fit plots

-> Fitting a model for multiple-record-per-event interval-censored data

Fitting a model for single-record-per-event interval-censored data

Consider a fictional dataset simulated based on the Atherosclerosis Risk in Communities (ARIC) study described in Xu, Zeng, and Lin (2023). This dataset comprises 200 subjects, each from one of four US communities. The participants were followed over time and assessed for both diabetes and hypertension during several follow-up examinations. Because the subjects were examined only periodically, the exact onset times of these diseases were not observed, but they are known to fall in intervals between doctor visits. Two variables, ltime and rtime, capture this information by recording the last examination time before the event occurred and the first examination time after the event, respectively.

We wish to identify the factors that influence the times to onset of diabetes and hypertension. The factors of interest include three demographic variables—race, sex (male), and community—and five baseline risk factors: age (age), body mass index (bmi), glucose level (glucose), systolic blood pressure (sysbp), and diastolic blood pressure (diabp). The dataset contains one record per event per subject with the event-time information recorded as interval data and is called single-record-per-event interval-censored data. Here is a subset of the dataset for subjects 91 and 92:

. webuse aric
(Simulated ARIC data)                  

. format bmi glucose %6.2f

. list id event ltime rtime bmi - diabp if id==91| id==92, sepby(id) noobs


 id          event   ltime   rtime     bmi   glucose   sysbp   diabp 
 91       Diabetes    2510       .   32.43     91.39     128      79 
 91   Hypertension      98    1418   32.43     91.39     128      79 
 92       Diabetes     596       .   27.04    105.39     118      82 
 92   Hypertension       0     596   27.04    105.39     118      82

We can fit a marginal Cox proportional hazards model in which the times to onset of diabetes and hypertension depend on the above factors of interest. In a single-record-per-event format, we must specify the id(), event(), and interval() options. The command performs intensive computations and may take a little longer to run.

. stmgintcox age i.male i.community i.race bmi glucose sysbp diabp, id(id) 
     event(event) interval(ltime rtime)
note: using adaptive step size to compute derivatives.

Performing EM optimization for event "Diabetes" (showing every 100 iterations):
Iteration 0:   Log pseudolikelihood = -179.18329
Iteration 100: Log pseudolikelihood = -110.21215
Iteration 200: Log pseudolikelihood = -109.98083
Iteration 300: Log pseudolikelihood = -109.93752
Iteration 400: Log pseudolikelihood = -109.92532
Iteration 500: Log pseudolikelihood = -109.92091
Iteration 600: Log pseudolikelihood = -109.91902
Iteration 624: Log pseudolikelihood = -109.91874

Performing EM optimization for event "Hypertension" (showing every 100 iterations):
Iteration 0:   Log pseudolikelihood = -208.75033
Iteration 100: Log pseudolikelihood = -160.09944
Iteration 200: Log pseudolikelihood = -159.88744
Iteration 300: Log pseudolikelihood =  -159.8376
Iteration 400: Log pseudolikelihood = -159.82201
Iteration 500: Log pseudolikelihood =  -159.8162
Iteration 600: Log pseudolikelihood = -159.81368
Iteration 610: Log pseudolikelihood = -159.81352

Computing standard errors: .....................................................
> ..............................................................................
> ..............................................................................
> ..............................................................................
> ..............................................................................
> ..............................................................................
> ........................... done

Marginal interval-censored Cox regression          Number of events   =      2
Baseline hazard: Reduced intervals                 Number of subjects =    200
                                                   Number of obs      =    400
ID variable: id                                            Uncensored =      0
Event variable: event                                   Left-censored =     47
Event-time interval:                                   Right-censored =    240
  Lower endpoint: ltime                                Interval-cens. =    113
  Upper endpoint: rtime
                                                   Wald chi2(20)      = 128.31
Log pseudolikelihood = -269.73225                  Prob > chi2        = 0.0000



                             Robust                                           
               Haz. ratio   std. err.      z    P>|z|     [95% conf. interval]
   
Diabetes                                                                      
         age       .95663   .0283264    -1.50   0.134     .9026917    1.013791
                                                                              
        male                                                                  
        Yes      .8067524   .2301774    -0.75   0.452     .4611911    1.411236
                                                                              
   community                                                                  
    Jackson      1.594206   .6427935     1.16   0.247     .7233276    3.513614
Minneapolis      1.036757   .3611512     0.10   0.917      .523798    2.052062
 Washington      1.419528     .49871     1.00   0.319     .7130152    2.826109
                                                                              
        race                                                                  
      White      .4267559   .1203645    -3.02   0.003     .2455286     .741749
         bmi     1.119352   .0262064     4.82   0.000     1.069149    1.171913
     glucose     1.139996   .0198973     7.51   0.000     1.101657    1.179669
       sysbp     1.020623   .0117379     1.77   0.076     .9978743     1.04389
       diabp     .9920976   .0117945    -0.67   0.505      .969248    1.015486
   
Hypertension                                                                  
         age     .9951778   .0192589    -0.25   0.803     .9581379     1.03365
                                                                              
        male                                                                  
        Yes      .6667426   .1584725    -1.71   0.088     .4184497    1.062364
                                                                              
   community                                                                  
    Jackson      .6045343   .2070578    -1.47   0.142     .3089425    1.182944
Minneapolis      .8997162   .2690857    -0.35   0.724     .5006446    1.616894
 Washington      .6734203   .2210427    -1.20   0.228     .3539067    1.281397
                                                                              
        race                                                                  
      White      1.240887   .3461451     0.77   0.439     .7182731    2.143752
         bmi     1.012572   .0170935     0.74   0.459     .9796174    1.046635
     glucose     .9898238   .0094633    -1.07   0.285     .9714488    1.008546
       sysbp     1.075034    .012595     6.18   0.000      1.05063    1.100006
       diabp     1.025542    .011092     2.33   0.020     1.004031    1.047514

Note: Standard error estimates may be more variable for small datasets and
      datasets with low proportions of interval-censored observations.

The header above the coefficient table gives a summary of the censoring information. In the diabetes output, we notice that White individuals have a lower risk of diabetes. Additionally, higher body mass index and elevated glucose levels are associated with a higher risk of diabetes. In the output for hypertension, we see that higher systolic and diastolic blood pressure are associated with a greater risk of hypertension.

Fitting a model with event-specific covariates

From the model above, we can see that body mass index and glucose level are key risk factors for diabetes but not for hypertension. On the flip side, systolic and diastolic blood pressure are important factors for hypertension but not for diabetes. Therefore, we can use different sets of covariates to model these two events. We also use the nolog option to suppress iteration logs and the favorspeed option to speed up computation.

. stmgintcox ("Diabetes": age i.male i.community i.race bmi glucose) 
     ("Hypertension": age i.male i.community i.race sysbp diabp), id(id) 
     event(event) interval(ltime rtime) nolog favorspeed 

note: using fixed step size with a multiplier of 5 to compute derivatives.
note: using EM and VCE tolerances of 0.0001.
note: option noemhsgtolerance assumed.

Marginal interval-censored Cox regression          Number of events   =      2
Baseline hazard: Reduced intervals                 Number of subjects =    200
                                                   Number of obs      =    400
ID variable: id                                            Uncensored =      0
Event variable: event                                   Left-censored =     47
Event-time interval:                                   Right-censored =    240
  Lower endpoint: ltime                                Interval-cens. =    113
  Upper endpoint: rtime
                                                   Wald chi2(16)      =  77.01
Log pseudolikelihood = -272.76543                  Prob > chi2        = 0.0000



                             Robust                                           
               Haz. ratio   std. err.      z    P>|z|     [95% conf. interval]
   
Diabetes                                                                      
         age     .9693495   .0293552    -1.03   0.304     .9134885    1.028626
                                                                              
        male                                                                  
        Yes      .8021755   .2273265    -0.78   0.437     .4603091    1.397942
                                                                              
   community                                                                  
    Jackson      1.549902   .6274179     1.08   0.279     .7010166    3.426733
Minneapolis      .9649113   .3361108    -0.10   0.918     .4875122    1.909806
 Washington       1.36829   .5112313     0.84   0.401     .6578786    2.845842
                                                                              
        race                                                                  
      White      .4412767    .135994    -2.65   0.008     .2412044    .8073037
         bmi     1.112781   .0314166     3.79   0.000     1.052878    1.176092
     glucose     1.141379   .0304922     4.95   0.000     1.083153    1.202735
   
Hypertension                                                                  
         age     .9945906   .0220662    -0.24   0.807     .9522686    1.038794
                                                                              
        male                                                                  
        Yes      .6229044   .1403048    -2.10   0.036     .4005846    .9686091
                                                                              
   community                                                                  
    Jackson       .606375   .1824113    -1.66   0.096     .3362643    1.093457
Minneapolis      .8873364   .2642854    -0.40   0.688     .4949546    1.590784
 Washington      .6548935   .1999546    -1.39   0.166     .3599802    1.191414
                                                                              
        race                                                                  
      White       1.26674   .4058107     0.74   0.460     .6760798    2.373433
       sysbp     1.072573   .0149785     5.02   0.000     1.043614    1.102336
       diabp     1.025091   .0138294     1.84   0.066     .9983414    1.052558

Note: Standard error estimates may be more variable for small datasets and
      datasets with low proportions of interval-censored observations.

Because age, male, community, and race are common covariates for both events, we can also specify the model by using a combination of common covariates and event-specific covariates:

. stmgintcox age i.male i.community i.race ("Diabetes": bmi glucose) ("Hypertension": sysbp diabp), 
     id(id) event(event) interval(ltime rtime) nolog favorspeed

Both of the above specifications will give us the same result.

Estimating and testing the average effect of a covariate across all events

After fitting the above model, suppose we want to test the hypothesis that the effects of age are zero for all events. We can use the estat common command to estimate an optimal weighted average effect of age across all events and conduct a z test to determine whether this average effect is zero under the null hypothesis.

. estat common age

    _avg_age: .294*["Diabetes"]age + .706*["Hypertension"]age



               Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
   
    _avg_age    -.0129749   .0200839    -0.65   0.518    -.0523386    .0263887

We fail to reject the null hypothesis that the weighted average effect of age across all events is zero (the p-value is 0.518). When the effects are similar across different events, this test is more powerful than the traditional multivariate Wald test, which can be performed with the test command.

Graphing survivor functions

We can use stcurve to plot the estimated survivor functions. By default, the stcurve command with the survival option evaluates the survivor functions at the overall means of covariates for each event and plots the survivor functions for both events as subgraphs.

. stcurve, survival
note: function evaluated at overall means of covariates for specified events.

We can add the sepevents option to request that the estimated survivor function for each event be displayed on a separate graph.

If we wish to compare the survival curve of an average person with diabetes across different communities, we can specify multiple values for community using the at() option and specify the event value label "Diabetes" in the events() option.

. stcurve, survival at(community=(1 2 3 4)) events("Diabetes")
note: function evaluated at specified values of selected covariates and
      overall means of other covariates (if any) for specified event.

The above survival curves show that an average person in Forsyth (blue line) and one in Minneapolis (green line) have a similar higher risk of developing diabetes, whereas an average person in Washington (yellow line) and one in Jackson (red line) have a lower risk of developing diabetes.

Assessing overall model fit by goodness-of-fit plots

We can use the estat gofplot command to produce the event-specific goodness-of-fit plots and visually assess the overall model fit. estat gofplot calculates an empirical estimate of the cumulative hazard function based on the Cox–Snell-like residuals for each event and plots the resulting cumulative hazard rate against the residuals themselves. If the model fits the data, those plots are expected to remain close to the reference line.

. estat gofplot

By default, estat gofplot displays all event-specific goodness-of-fit plots as subgraphs within a single graph. The plot on the left indicates that the marginal Cox proportional hazards model fits the data well for diabetes. The plot on the right suggests that the marginal Cox model fits the data mostly well for hypertension, except for the tail, which deviates from the reference line because of an outlier.

Fitting a model for multiple-record-per-event interval-censored data

Interval-censored data can also be recorded in multiple-record-per-event format. In this format, each event for a subject may contain multiple records with multiple examination times and potential TVCs at each examination time. Here we use an extended version of the previous dataset. It contains all the baseline covariates as well as four time-varying covariates: bmi, glucose, sysbp, and diabp. It also records the examination time (time) and whether the event occurred since the last examination (status). Here is a subset of this dataset for subjects 91 and 92:

. webuse aric2
(Simulated ARIC data with time-varying variables)

. format bmi glucose %6.2f

. list id event time status bmi-diabp if id==91 | id==92, sepby(id) noobs


 id          event   time   status     bmi   glucose   sysbp   diabp 
 91       Diabetes     98        0   32.43     91.39     128      79 
 91       Diabetes   1418        0   36.10     97.95     130      72 
 91       Diabetes   2510        0   32.72     97.72     134      67 
 91   Hypertension     98        0   32.43     91.39     128      79 
 91   Hypertension   1418        1   36.10     97.95     130      72 
 92       Diabetes    596        0   27.04    105.39     118      82 
 92   Hypertension    596        1   27.04    105.39     118      82

We use stmgintcox to fit a marginal Cox proportional hazards model in which the times to onset of events (diabetes and hypertension) depend on baseline covariates age, male, community, and race and TVCs bmi, glucose, sysbp, and diabp. In a multiple-record-per-event format, we must specify the id(), event(), time(), and status() options. We also include the detail option to get detailed censoring information for each event.

. stmgintcox age i.male i.community i.race bmi glucose sysbp diabp, id(id) 
     event(event) time(time) status(status) detail nolog

note: time-varying covariates detected in the data; using method nearleft to
      impute their values between examination times.
note: using adaptive step size to compute derivatives.

Marginal interval-censored Cox regression        Number of events   =        2
Baseline hazard: Reduced intervals               Number of subjects =      200
                                                 Number of obs      =      830
ID variable: id                                          Uncensored =        0
Event variable: event                                 Left-censored =       47
Examination time: time                               Right-censored =      240
Status indicator: status                             Interval-cens. =      113

                                                 Wald chi2(20)      = 13197.36
Log pseudolikelihood = -264.77802                Prob > chi2        =   0.0000


                                       Subjects per event
          
              No. of   Uncensored                Censored               Total
Event            obs.                   Left      Right   Interval
 Diabetes        420            0         22        134         44        200
Hyperte~n        410            0         25        106         69        200




                             Robust                                           
        time   Haz. ratio   std. err.      z    P>|z|     [95% conf. interval]
   
Diabetes                                                                      
         age     .9551756    .028196    -1.55   0.120     .9014808    1.012069
                                                                              
        male                                                                  
        Yes      .8063021   .2397021    -0.72   0.469     .4502437    1.443936
                                                                              
   community                                                                  
    Jackson      1.495371   .6396808     0.94   0.347      .646587    3.458365
Minneapolis      1.022231   .3737445     0.06   0.952     .4992699    2.092967
 Washington      1.457473    .505914     1.09   0.278     .7381318    2.877844
                                                                              
        race                                                                  
      White      .4329106   .1197329    -3.03   0.002     .2517537    .7444243
         bmi     1.119833   .0211927     5.98   0.000     1.079057     1.16215
     glucose     1.160905   .0161295    10.74   0.000     1.129718    1.192952
       sysbp     1.024272   .0124467     1.97   0.048     1.000165    1.048959
       diabp     .9822802   .0117356    -1.50   0.135      .959546    1.005553

Hypertension                                                                  
         age     1.000677   .0172332     0.04   0.969      .967464     1.03503
                                                                              
        male                                                                  
        Yes      .7086539   .1573446    -1.55   0.121     .4586054    1.095038
                                                                              
   community                                                                  
    Jackson      .6784416   .2244297    -1.17   0.241      .354759    1.297453
Minneapolis       1.00722   .3078072     0.02   0.981      .553346    1.833379
 Washington      .7480168   .2412699    -0.90   0.368     .3975206    1.407548
                                                                              
        race                                                                  
      White      1.321735   .3545515     1.04   0.298     .7812895    2.236025
         bmi     1.020171   .0171845     1.19   0.236     .9870395    1.054414
     glucose     .9954429   .0094096    -0.48   0.629     .9771702    1.014057
       sysbp     1.077175   .0099561     8.04   0.000     1.057837    1.096866
       diabp     1.022089    .010188     2.19   0.028     1.002315    1.042254

Time varying: bmi glucose sysbp diabp
Note: Standard error estimates may be more variable for small datasets and
      datasets with low proportions of interval-censored observations.

After accounting for time-varying risk factors, we see that elevated glucose levels, a higher body mass index, and elevated systolic blood pressure are associated with an increased risk of diabetes onset. Being White is associated with a lower risk of developing diabetes. Also, elevated systolic and diastolic blood pressure levels are associated with a higher risk of developing hypertension.

References

Xu, Y., D. Zeng, and D. Lin. 2023. Marginal proportional hazards models for multivariate interval-censored data. Biometrika 110: 815–830.

The ARIC investigators. 1989. The Atherosclerosis Risk in Communities (ARIC) study: Design and objectives. American Journal of Epidemiology 129: 687–702. https://doi.org/10.1093/oxfordjournals.aje.a115184.

Tell me more

Learn more about how to fit marginal Cox proportional hazards models to multivariate interval-censored data using the stmgintcox command; see [ST] stmgintcox.

Learn more about how to fit semiparametric Cox proportional hazards models to univariate interval-censored data using the stintcox command; see [ST] stintcox.

View all the new features in Stata 18.

Made for data science.

Get started today.

Order

Upgrade