Home  /  Products  /  Stata 18  /  TVCs with interval-censored Cox model

<- See Stata 18's new features

Highlights

  • Single- and multiple-record data formats

  • Time-varying covariates (TVCs):

    • Created automatically as deterministic functions of time in both formats

    • Supplied directly in a multiple-record data format

    • Use to test the proportional-hazards assumption

  • Predictions with TVCs

  • Plots of functions with TVCs: survivor, cumulative hazard, and more

  • See more survival analysis features

Stata 17 introduced the stintcox command to fit genuine semiparametric Cox models to interval-censored event-time data. Stata 18 adds support for time-varying covariates (TVCs), commonly used in practice to record characteristics that change during follow-up. So, whether you have a biomarker variable or a policy variable that changes with time, you can now include them to model events of interest known only to lie within some time interval. For instance, the event may be a recurrence of cancer or a change in employment status recorded during one of the follow-ups. Both are examples of interval-censored event-time data, in which the event time is not observed exactly.

Use stintcox to create TVCs automatically by interacting existing covariates with specified deterministic functions of time or specify your own TVCs in the new multiple-record-per-subject data format. Use TVCs to test the proportional-hazards assumption with the Wald test or likelihood-ratio test. And incorporate TVCs in your predictions and plots of survivor and other functions.

Let's see it work

-> Testing the proportional-hazards assumption using TVCs

-> Fitting an interval-censored Cox model using TVCs in multiple-record data

-> Graphing survivor functions

Testing the proportional-hazards assumption using TVCs

Here we demonstrate how to use TVCs to test the proportional-hazards assumption of a Cox model. Following Zeng et al. (2016), we use an interval-censored Cox regression to model time to HIV infection of injecting drug users. All 1,124 subjects initially tested negative for HIV. They were followed and assessed for HIV-1 seropositivity through blood tests approximately every four months. Because the subjects were tested periodically, the exact time of HIV infection was not observed, but the times were known to fall in intervals between blood tests. The corresponding lower and upper time endpoints are recorded in variables ltime and rtime.

The baseline factors we use to model the HIV-1 seropositivity are centered age at recruitment (age_mean), sex (male), history of needle sharing (needle), history of drug injection (inject), and whether a subject has been in jail at the time of recruitment (jail). Here is a subset of the dataset for subjects 271 through 274 with one record per subject:

. use https://www.stata-press.com/data/r18/idu
(Modified Bangkok IDU Preparatory Study)

. format ltime rtime age_mean %6.2f

. list ltime rtime age_mean male needle inject jail if _n >= 271 & _n <= 274

ltime rtime age_mean male needle inject jail
271. 22.00 . -6.46 Yes Yes No No
272. 3.80 9.41 8.54 No No No Yes
273. 20.66 . -11.46 Yes Yes No No
274. 0.00 3.87 -4.46 Yes Yes Yes Yes

We fit a Cox proportional hazards model in which time to HIV infection depends on the above baseline factors. In a single-record-per-subject format, we specify the lower and upper event-time intervals in stintcox's interval() option.

. stintcox age_mean i.male i.needle i.inject i.jail, interval(ltime rtime)
note: using adaptive step size to compute derivatives.

Performing EM optimization (showing every 100 iterations):
Iteration 0:    Log likelihood = -1086.2564
Iteration 100:  Log likelihood = -597.65634
Iteration 200:  Log likelihood = -597.57555
Iteration 295:  Log likelihood = -597.56443

Computing standard errors: ............................ done

Interval-censored Cox regression                    Number of obs     =  1,124
Baseline hazard: Reduced intervals                         Uncensored =      0
                                                        Left-censored =     41
                                                       Right-censored =    991
                                                       Interval-cens. =     92

                                                    Wald chi2(5)      =  17.10
Log likelihood = -597.56443                         Prob > chi2       = 0.0043

OPG
Haz. ratio std. err. z P>|z| [95% conf. interval]
age_mean .9684341 .0126552 -2.45 0.014 .9439452 .9935582
male
Yes .6846949 .1855907 -1.40 0.162 .4025073 1.164717
needle
Yes 1.275912 .2279038 1.36 0.173 .8990401 1.810768
inject
Yes 1.250154 .2414221 1.16 0.248 .8562184 1.825334
jail
Yes 1.567244 .3473972 2.03 0.043 1.014982 2.419998
Note: Standard-error estimates may be more variable for small datasets and datasets with low proportions of interval-censored observations.

We find that age and being in jail at enrollment appear to be associated with, respectively, lower and higher risks of HIV infection.

One way of testing the proportional-hazards assumption for a covariate is to test whether the coefficient associated with that covariate is time invariant. This can be accomplished by including an interaction between this covariate and a function of time in the model and testing whether the corresponding coefficient equals zero.

To test the proportional-hazards assumption in our example, we include all covariates in option tvc() to additionally include their interactions with the analysis time _t, the default, in the model. In this analysis, we suppress the default reporting of hazard ratios with option nohr:

. stintcox age_mean i.male i.needle i.inject i.jail, interval(ltime rtime)
     tvc(age_mean i.male i.needle i.inject i.jail) nohr
note: using adaptive step size to compute derivatives.

Performing EM optimization (showing every 100 iterations):
Iteration 0:   Log likelihood = -1086.2564
Iteration 100: Log likelihood = -590.53655
Iteration 200: Log likelihood = -590.45163
Iteration 300: Log likelihood = -590.43665
Iteration 340: Log likelihood = -590.43386

Computing standard errors: .............................................. done


Interval-censored Cox regression                    Number of obs     =  1,124
Baseline hazard: Reduced intervals                         Uncensored =      0
                                                        Left-censored =     41
Event-time interval:                                   Right-censored =    991
  Lower endpoint: ltime                                Interval-cens. =     92
  Upper endpoint: rtime
                                                    Wald chi2(10)     =  31.99
Log likelihood = -590.43386                         Prob > chi2       = 0.0004

OPG
Haz. ratio std. err. z P>|z| [95% conf. interval]
main
age_mean -.0310177 .0233817 -1.33 0.185 -.076845 .0148097
male
Yes -1.271583 .4604788 -2.76 0.006 -2.174105 -.3690615
needle
Yes -.1819587 .3297493 -0.55 0.581 -.8282554 .464338
inject
Yes .6852961 .3431924 2.00 0.046 .0126513 1.357941
jail
Yes -.529615 .4021087 -1.32 0.188 -1.317734 .2585036
tvc
age_mean -.000129 .0017099 -0.08 0.940 -.0034804 .0032224
male
Yes .0884102 .042994 2.06 0.040 .0041434 .1726769
needle
Yes .0358545 .0238562 1.50 0.133 -.0109027 .0826118
inject
Yes -.0361192 .0228754 -1.58 0.114 -.0809541 .0087157
jail
Yes .0916036 .0348915 2.63 0.009 .0232176 .1599896
Notes: Standard error estimates may be more variable for small datasets and datasets with low proportions of interval-censored observations. Variables in tvc equation interacted with _t. Wald test that [tvc] = 0: chi2(5) = 13.3282 Prob > chi2 = 0.0205

The first equation, main, reports the coefficients for the covariates that do not vary over time; the second equation, tvc, reports the results for covariates interacted with time. And the proprtional-hazards assumption appears to be violated for covariates male and jail according to their p-values reported in the tvc equation. The Wald test at the bottom of the table also indicates that the proportional-hazards assumption doesn't hold globally.

Fitting an interval-censored Cox model using TVCs in multiple-record data

In the previous example, we used stintcox's tvc() option to create TVCs to test the proportional-hazards assumption. In some applications, TVCs already exist in the dataset. Continuing with the data on injecting drug users, being in jail is a TVC. TVCs can be recorded only in multiple-record-per-subject data format. In this format, each subject may contain multiple records with multiple examination times, the event status at each examination time, and potential time-varying covariates at each examination time.

Let's use an extended version, idu2.dta, of the previous idu.dta dataset, which contains all the baseline covariates as well as a time-varying imprisonment indicator variable (jail_vary). jail_vary indicates whether the subject has been imprisioned since the last clinic visit. The dataset also records a subject identifier (id), the examination time of the blood test (time), and whether the blood test is positive at the examination time (is_seropos). Here is a subset of this dataset for subjects 271 through 274 with multiple records per subject:

. use https://www.stata-press.com/data/r18/idu2
(Modified Bangkok IDU Preparatory Study with time-varying variable jail_vary)

. format time age_mean %6.2f

. list id time is_seropos age_mean male needle inject jail_vary
     if id >= 271 & id <= 274, sepby(id) noobs abbreviate(10) 

id time is_seropos age_mean male needle inject jail_vary
271 4.89 No -6.46 Yes Yes No No
271 9.31 No -6.46 Yes Yes No No
271 13.38 No -6.46 Yes Yes No Yes
271 17.97 No -6.46 Yes Yes No Yes
271 22.00 No -6.46 Yes Yes No No
272 3.80 No 8.54 No No No Yes
272 9.41 Yes 8.54 No No No No
273 3.93 No -11.46 Yes Yes No No
273 8.00 No -11.46 Yes Yes No No
273 12.07 No -11.46 Yes Yes No Yes
273 15.97 No -11.46 Yes Yes No Yes
273 20.66 No -11.46 Yes Yes No Yes
274 3.87 Yes -4.46 Yes Yes Yes Yes

We refit the previous Cox model, but we now use the time-varying variable jail_vary instead of the baseline variable jail. In a multiple-record-per-subject format, we specify the subject identifier in option id(), the examination time in option time(), and the event-indicator status in option status().

. stintcox age_mean i.male i.needle i.inject i.jail_vary, id(id) time(time)   
     status(is_seropos)
note: time-varying covariates detected in the data; using method nearleft to
      impute their values between examination times.
note: using adaptive step size to compute derivatives.

Performing EM optimization (showing every 100 iterations):
Iteration 0:   Log likelihood = -1086.2564
Iteration 100: Log likelihood = -598.45375
Iteration 200: Log likelihood = -598.35872
Iteration 285: Log likelihood = -598.34887

Computing standard errors: ........................................ done

Interval-censored Cox regression                   Number of obs      =  6,453
Baseline hazard: Reduced intervals                 Number of subjects =  1,124
                                                           Uncensored =      0
ID variable: id                                         Left-censored =     41
Examination time: time                                 Right-censored =    991
Status indicator: is_seropos                           Interval-cens. =     92

                                                   Wald chi2(5)       =  17.03
Log likelihood = -598.34887                        Prob > chi2        = 0.0044

OPG
time Haz. ratio std. err. z P>|z| [95% conf. interval]
age_mean .9714605 .012757 -2.20 0.027 .9467762 .9967884
male
Yes .6678044 .1816576 -1.48 0.138 .3918353 1.138138
needle
Yes 1.271409 .2275426 1.34 0.180 .8952546 1.805609
inject
Yes 1.370672 .2575405 1.68 0.093 .9484142 1.980928
jail_vary
Yes 1.440966 .2916178 1.81 0.071 .9691488 2.142481
Time varying: jail_vary Note: Standard error estimates may be more variable for small datasets and datasets with low proportions of interval-censored observations.

Compared with the previous example, after we account for time-varying imprisonment, the hazard ratio for inject increases from 1.25 to 1.37, but the effect of imprisonment decreases from 1.57 for baseline jail to 1.44 for time-varying jail_vary.

Graphing survivor functions

After fitting the model, we can use stcurve to plot the estimated survivor, failure, hazard, or cumulative hazard function. By default, stcurve evaluates the functions at the overall means of covariates.

. stcurve, survival
note: function evaluated at overall means of covariates.

With multiple-record data with TVCs, we may want to incorporate the time-varying nature of those covariates when plotting the functions. In this case, we can use option attmeans to evaluate the function at time-specific means.

. stcurve, survival attmeans
note: function evaluated at time-specific means of covariates.

We can also use option atframe() to specify our own TVC values to be used to evaluate the survivor function. Suppose we want to plot the survivor curve for an individual with the same covariate pattern as subject 1 in our dataset. We create a new frame called id1 and use frame put to copy the relevant information for subject 1 to this new frame. We list the data we just saved in frame id1.

. frame put time age_mean male needle inject jail_vary if id==1, into(id1)

. frame id1: list

time age_mean male needle inject jail_v~y
1. 4.13 5.54 Yes No Yes Yes
2. 8.26 5.54 Yes No Yes No
3. 12.33 5.54 Yes No Yes No
4. 16.10 5.54 Yes No Yes No
5. 20.10 5.54 Yes No Yes No
6. 24.23 5.54 Yes No Yes No
7. 28.62 5.54 Yes No Yes No
8. 32.59 5.54 Yes No Yes No
9. 36.20 5.54 Yes No Yes No
10. 40.56 5.54 Yes No Yes Yes

Then we can graph the survivor curve for this particular profile by typing

. stcurve, survival atframe(id1)
note: function evaluated at specified values of selected covariates and overall
      means of other covariates (if any).
note: covariate values from frame id1 used to evaluate function.

Reference

Zeng, D., L. Mao, and D. Lin. 2016. Maximum likelihood estimation for semiparametric transformation models with interval-censored data. Biometrika 103: 253—271.

Tell me more

Learn more about other new features in survival analysis.

Read more in the Stata Survival Analysis Reference Manual; see [ST] stintcox.

View all the new features in Stata 18.

Made for data science.

Get started today.