Home  /  Products  /  Features  /  Survival models for SEM

<-  See Stata's other features

Highlights

  • Structural equation models with survival outcomes

    • Latent predictors of survival outcomes

    • Path models, growth curve models, and more

    • Multilevel models — random intercepts and random coefficients

    • Survival outcomes with other outcomes

  • Right-censoring

  • Left-truncation

  • Parametric models — exponential, loglogistic, Weibull, lognormal, and gamma survival distributions

  • Support for complex survey data

  • Marginal predictions and marginal means

Survival-time outcomes measure durations such as time to death, length of hospital stay, time to recurrence of a particular type of cancer, or length of time living in a city. We can fit survival models to evaluate the effects of covariates on these survival times.

For example, we might be interested in high school students who are at risk of dropping out of school. We can model the number of months from the start of ninth grade until dropout as a function of grade point average, attendance, and eighth-grade achievement test scores. We can fit either a parametric or a semiparametric model with Stata's streg or stcox commands.

For more complex models, we can use gsem to model survival-time outcomes as part of a larger structural equation model. gsem fits multilevel structural equation models and structural equation models with binary, ordinal, count, and other types of outcomes. What does this mean for modeling survival-time outcomes? Let's consider some extensions to our model for time to dropout.

Suppose that at the beginning of ninth grade, each at-risk student in our sample answered a series of questions regarding the support each receives from parents, teachers, counselors, and peers. The responses to these questions measure the student's perceived level of support, a latent concept that we believe affects time to dropout. gsem allows us to include a latent variable as a predictor of time to dropout.

Perhaps we believe that time to dropout is also affected by unobserved characteristics of the school such as an administrator's abilities to identify early warning signs of dropout. With gsem, we can include school-level random effects in our model.

Say that we are also interested in modeling a count outcome, the number of times a student has been in trouble at school. We believe this outcome is also related to the student's perceived level of support. With gsem, we can simultaneously fit a Weibull model for time to dropout and a Poisson model for number of offenses at school, and the two models can be tied together through the student's perceived level of support. We can also include number of offenses as a predictor of time to dropout.

In other words, gsem allows us to extend parametric models for survival-time outcomes to include latent variables, to include multiple levels of random effects, and to include multiple outcomes.

Let's see it work

Now, let's consider another type of survival-time outcome.

We want to analyze survival times of nursing home residents. We have censored data; thankfully, not all the residents have died yet.

We posit that survival times are determined by age, depression level, and overall health. Both depression and overall health are latent traits. We have four variables that measure aspects of depression and four variables that measure aspects of overall health. We fit a Weibull model for survival time along with the measurement models for depression and overall health.

We can create our model using Stata's SEM Builder:

dialog box for teffects

Or we can type the command

. gsem (surv_time <- age Depress Health, 
     family(weibull, fail(death)))
     (Depress -> dep1 dep2 dep3 dep4)
     (Health -> hlth1 hlth3 hlth3 hlth4),
     variance(Depress@1 Health@1)

The results are

Coefficient Std. err. z P>|z| [95% conf. interval]
surv_time
age .1078273 .0120192 8.97 0.000 .0842701 .1313844
Depress 1.2204 .1143472 10.67 0.000 .9962836 1.444516
Health -1.782012 .1526019 -11.68 0.000 -2.081106 -1.482918
_cons -6.949501 .8724378 -7.97 0.000 -8.659448 -5.239554
dep1
Depress 1.038601 .0409685 25.35 0.000 .9583046 1.118898
_cons .091441 .0510287 1.79 0.073 -.0085734 .1914553
dep2
Depress .5079995 .0381469 13.32 0.000 .433233 .582766
_cons .0439719 .039877 1.10 0.270 -.0341856 .1221295
dep3
Depress .7126734 .0370931 19.21 0.000 .6399723 .7853746
_cons .0856129 .0421749 2.03 0.042 .0029516 .1682743
dep4
Depress 1.248634 .055827 22.37 0.000 1.139215 1.358053
_cons .0591629 .0665563 0.89 0.374 -.0712852 .1896109
hlth1
Depress 1.062984 .0414027 25.67 0.000 .9818361 1.144132
_cons -.0424677 .0523266 -0.81 0.417 -.1450259 .0600906
hlth2
Depress .4933346 .038082 12.95 0.000 .4186953 .5679738
_cons -.0357306 .0398651 -0.90 0.370 -.1138647 .0424035
hlth3
Depress .7293404 .0362297 20.13 0.000 .6583315 .8003493
_cons .0165731 .0419967 0.39 0.693 -.0657389 .0988851
hlth4
Depress 1.273872 .0550442 23.14 0.000 1.165988 1.381757
_cons -.0487405 .0668903 -0.73 0.466 -.179843 .0823621
/surv_time
ln_p -.5314218 .0716854 -.6719227 -.390921
var(Depress) 1 (constrained)
var(Health) 1 (constrained)
cov(Depress,Health) .0411 .0495383 0.83 0.407 -.0559932 .1381932
var(e.dep1) .2232991 .0293288 .1726185 .2888595
var(e.dep2) .5370324 .0357861 .4712802 .6119583
var(e.dep3) .3814731 .0282049 .3300113 .4409598
var(e.dep4) .6558288 .0570452 .5530335 .7777311
var(e.hlth1) .2397716 .0264884 .1930913 .297737
var(e.hlth2) .5513769 .0362948 .4846381 .6273062
var(e.hlth3) .350238 .0257087 .3033068 .404431
var(e.hlth4) .6153669 .0515441 .5221991 .7251571

gsem reports coefficients. Because we fit a Weibull model for surv_time, exponentiated coefficients can be interpreted as hazard ratios and are reported by estat eform.

. estat eform surv_time

surv_time exp(b) Std. err. z P>|z| [95% conf. interval]
age 1.113855 .0133876 8.97 0.000 1.087923 1.140406
Depress 3.388543 .3874702 10.67 0.000 2.708198 4.2398
Health .1682992 .0256828 -11.68 0.000 .1247921 .2269745
_cons .0009591 .0008368 -7.97 0.000 .0001735 .0053026

Depress and age have hazard ratios greater than 1; being more depressed and being older both correspond to increased hazard and thus decreased survival times. Health has a hazard ratio less than 1; better overall health corresponds to increased survival times.

We use margins to compute marginal mean survival times for ages ranging from 70 to 85.

. margins, at(age=(70(3)85)) predict(mu outcome(surv_time)) noatlegend


Adjusted predictions                            Number of obs     =        500
Model VCE: OIM

Expression: Marginal predicted mean (analysis time when record ends),
            predict(mu outcome(surv_time))

Delta-method
Margin std. err. z P>|z| [95% conf. interval]
_at
1 5.263968 1.965496 2.68 0.007 1.411666 9.116271
2 3.035966 1.111428 2.73 0.006 .8576069 5.214325
3 1.750977 .6397585 2.74 0.006 .4970736 3.004881
4 1.009867 .3749436 2.69 0.007 .2749909 1.744743
5 .5824353 .2233669 2.61 0.009 .1446443 1.020226
6 .3359165 .1348604 2.49 0.013 .0715949 .600238

We can plot these marginal means using marginsplot.

As expected, the predicted marginal mean survival time decreases with age.

Tell me more

You can read more about Stata's SEM features and see several worked examples in Structural Equation Modeling Reference Manual.