Structural equation models with survival outcomes
Latent predictors of survival outcomes
Path models, growth curve models, and more
Multilevel models — random intercepts and random coefficients
Survival outcomes with other outcomes
Right-censoring
Left-truncation
Parametric models — exponential, loglogistic, Weibull, lognormal, and gamma survival distributions
Support for complex survey data
Marginal predictions and marginal means
Survival-time outcomes measure durations such as time to death, length of hospital stay, time to recurrence of a particular type of cancer, or length of time living in a city. We can fit survival models to evaluate the effects of covariates on these survival times.
For example, we might be interested in high school students who are at risk of dropping out of school. We can model the number of months from the start of ninth grade until dropout as a function of grade point average, attendance, and eighth-grade achievement test scores. We can fit either a parametric or a semiparametric model with Stata's streg or stcox commands.
For more complex models, we can use gsem to model survival-time outcomes as part of a larger structural equation model. gsem fits multilevel structural equation models and structural equation models with binary, ordinal, count, and other types of outcomes. What does this mean for modeling survival-time outcomes? Let's consider some extensions to our model for time to dropout.
Suppose that at the beginning of ninth grade, each at-risk student in our sample answered a series of questions regarding the support each receives from parents, teachers, counselors, and peers. The responses to these questions measure the student's perceived level of support, a latent concept that we believe affects time to dropout. gsem allows us to include a latent variable as a predictor of time to dropout.
Perhaps we believe that time to dropout is also affected by unobserved characteristics of the school such as an administrator's abilities to identify early warning signs of dropout. With gsem, we can include school-level random effects in our model.
Say that we are also interested in modeling a count outcome, the number of times a student has been in trouble at school. We believe this outcome is also related to the student's perceived level of support. With gsem, we can simultaneously fit a Weibull model for time to dropout and a Poisson model for number of offenses at school, and the two models can be tied together through the student's perceived level of support. We can also include number of offenses as a predictor of time to dropout.
In other words, gsem allows us to extend parametric models for survival-time outcomes to include latent variables, to include multiple levels of random effects, and to include multiple outcomes.
Now, let's consider another type of survival-time outcome.
We want to analyze survival times of nursing home residents. We have censored data; thankfully, not all the residents have died yet.
We posit that survival times are determined by age, depression level, and overall health. Both depression and overall health are latent traits. We have four variables that measure aspects of depression and four variables that measure aspects of overall health. We fit a Weibull model for survival time along with the measurement models for depression and overall health.
We can create our model using Stata's SEM Builder:
Or we can type the command
. gsem (surv_time <- age Depress Health, family(weibull, fail(death))) (Depress -> dep1 dep2 dep3 dep4) (Health -> hlth1 hlth3 hlth3 hlth4), variance(Depress@1 Health@1)
The results are
Coefficient Std. err. z P>|z| [95% conf. interval] | ||
surv_time | ||
age | .1078273 .0120192 8.97 0.000 .0842701 .1313844 | |
Depress | 1.2204 .1143472 10.67 0.000 .9962836 1.444516 | |
Health | -1.782012 .1526019 -11.68 0.000 -2.081106 -1.482918 | |
_cons | -6.949501 .8724378 -7.97 0.000 -8.659448 -5.239554 | dep1 |
Depress | 1.038601 .0409685 25.35 0.000 .9583046 1.118898 | |
_cons | .091441 .0510287 1.79 0.073 -.0085734 .1914553 | dep2 |
Depress | .5079995 .0381469 13.32 0.000 .433233 .582766 | |
_cons | .0439719 .039877 1.10 0.270 -.0341856 .1221295 | dep3 |
Depress | .7126734 .0370931 19.21 0.000 .6399723 .7853746 | |
_cons | .0856129 .0421749 2.03 0.042 .0029516 .1682743 | dep4 |
Depress | 1.248634 .055827 22.37 0.000 1.139215 1.358053 | |
_cons | .0591629 .0665563 0.89 0.374 -.0712852 .1896109 | hlth1 |
Depress | 1.062984 .0414027 25.67 0.000 .9818361 1.144132 | |
_cons | -.0424677 .0523266 -0.81 0.417 -.1450259 .0600906 | hlth2 |
Depress | .4933346 .038082 12.95 0.000 .4186953 .5679738 | |
_cons | -.0357306 .0398651 -0.90 0.370 -.1138647 .0424035 | hlth3 |
Depress | .7293404 .0362297 20.13 0.000 .6583315 .8003493 | |
_cons | .0165731 .0419967 0.39 0.693 -.0657389 .0988851 | hlth4 |
Depress | 1.273872 .0550442 23.14 0.000 1.165988 1.381757 | |
_cons | -.0487405 .0668903 -0.73 0.466 -.179843 .0823621 | /surv_time |
ln_p | -.5314218 .0716854 -.6719227 -.390921 | |
var(Depress) | 1 (constrained) | |
var(Health) | 1 (constrained) | |
cov(Depress,Health) | .0411 .0495383 0.83 0.407 -.0559932 .1381932 | |
var(e.dep1) | .2232991 .0293288 .1726185 .2888595 | |
var(e.dep2) | .5370324 .0357861 .4712802 .6119583 | |
var(e.dep3) | .3814731 .0282049 .3300113 .4409598 | |
var(e.dep4) | .6558288 .0570452 .5530335 .7777311 | |
var(e.hlth1) | .2397716 .0264884 .1930913 .297737 | |
var(e.hlth2) | .5513769 .0362948 .4846381 .6273062 | |
var(e.hlth3) | .350238 .0257087 .3033068 .404431 | |
var(e.hlth4) | .6153669 .0515441 .5221991 .7251571 | |
gsem reports coefficients. Because we fit a Weibull model for surv_time, exponentiated coefficients can be interpreted as hazard ratios and are reported by estat eform.
. estat eform surv_time
surv_time | exp(b) Std. err. z P>|z| [95% conf. interval] | |
age | 1.113855 .0133876 8.97 0.000 1.087923 1.140406 | |
Depress | 3.388543 .3874702 10.67 0.000 2.708198 4.2398 | |
Health | .1682992 .0256828 -11.68 0.000 .1247921 .2269745 | |
_cons | .0009591 .0008368 -7.97 0.000 .0001735 .0053026 | |
Depress and age have hazard ratios greater than 1; being more depressed and being older both correspond to increased hazard and thus decreased survival times. Health has a hazard ratio less than 1; better overall health corresponds to increased survival times.
We use margins to compute marginal mean survival times for ages ranging from 70 to 85.
. margins, at(age=(70(3)85)) predict(mu outcome(surv_time)) noatlegend Adjusted predictions Number of obs = 500 Model VCE: OIM Expression: Marginal predicted mean (analysis time when record ends), predict(mu outcome(surv_time))
Delta-method | ||
Margin std. err. z P>|z| [95% conf. interval] | ||
_at | ||
1 | 5.263968 1.965496 2.68 0.007 1.411666 9.116271 | |
2 | 3.035966 1.111428 2.73 0.006 .8576069 5.214325 | |
3 | 1.750977 .6397585 2.74 0.006 .4970736 3.004881 | |
4 | 1.009867 .3749436 2.69 0.007 .2749909 1.744743 | |
5 | .5824353 .2233669 2.61 0.009 .1446443 1.020226 | |
6 | .3359165 .1348604 2.49 0.013 .0715949 .600238 | |
We can plot these marginal means using marginsplot.
As expected, the predicted marginal mean survival time decreases with age.
You can read more about Stata's SEM features and see several worked examples in Structural Equation Modeling Reference Manual.