I apologize for the length of this post but I have a problem that has many
of us at work stumped.
I have the STATA press book "An Introduction to Survival Analysis Using
Stata" and I am trying to perform a survival analysis using -tvc-. I have
persons who all have hepatitis C (a risk factor for End Stage Liver Disease
- ESLD) and then eventually get become HIV+ and then eventually start taking
drugs for HIV (ART). HIV and ART are both risk factors for ESLD. My
survival analysis uses date of infection with hep c as the origin and the
endpoint as date of esld, or last date of follow up. I have dates of HIV
and ART as well. I set up my data with four observations per person, one
for hep c date, one for hiv date, one for art date, and one for esld or last
follow up date. I also have age at hep c infection, (agebase).
+-----------------------------------------------------------+
| id visit examdat exitdate hiv art esld |
|-----------------------------------------------------------|
| 1 1 15jun1970 15jun1983 no No Rx no |
| 1 2 15jun1983 15jun1988 yes No Rx no |
| 1 3 15jun1988 14dec1988 yes ART no |
| 1 4 15jun1989 15jun1989 yes ART yes |
|-----------------------------------------------------------|
| 2 1 15jun1970 14jun1981 no No Rx no |
| 2 2 14jun1981 15jun1987 yes No Rx no |
| 2 3 15jun1987 14jun1988 yes ART no |
| 2 4 14jun1989 14jun1989 yes ART yes |
|-----------------------------------------------------------|
The example in the book though uses values of a continuous variable that
change relative to time. My data however, uses a categorical variable that
changes with respect to time. I assumed this did not matter - what was
important is that the variable varies with respect to time.
I'm having trouble trusting my results though. When I analyze that data
using -tvc-
I have hiv and art do NOT increase the risk of esld (I realize I have a
power issue):
stcox agebase, tvc(hiv art) mgale(mg) schoenfeld(sc*) scaledsch(ssc*)
No. of subjects = 157 Number of obs =
628
No. of failures = 24
Time at risk = 3913
LR chi2(3) =
31.56
Log likelihood = -93.64632 Prob > chi2 =
0.0000
----------------------------------------------------------------------------
--
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf.
Interval]
-------------+--------------------------------------------------------------
--
rh |
agebase | 1.776424 .2582205 3.95 0.000 1.33603
2.361985
-------------+--------------------------------------------------------------
--
t |
hiv | 1.041784 .0412923 1.03 0.302 .9639167
1.125943
art | 1.062477 .0416937 1.54 0.123 .9838228
1.14742
----------------------------------------------------------------------------
--
stphtest, rank detail
Test of proportional hazards assumption
Time: Rank(t)
----------------------------------------------------------------
| rho chi2 df Prob>chi2
------------+---------------------------------------------------
agebase | -0.38675 1.72 1 0.1902
hiv | -0.23011 0.70 1 0.4015
art | 0.36422 1.04 1 0.3079
------------+---------------------------------------------------
global test | 2.88 3 0.4108
----------------------------------------------------------------
So there doesn't appear to be any evidence that my model violates the
proportional hazards assumption.
If I don't use -tvc- I get very different hazards:
stcox agebase hiv art
----------------------------------------------------------------------------
--
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf.
Interval]
-------------+--------------------------------------------------------------
--
agebase | 1.786311 .2587296 4.01 0.000 1.344835
2.372714
hiv | 3.392652 2.445086 1.70 0.090 .826186
13.93159
art | 3.892577 3.060481 1.73 0.084 .8336669
18.17531
----------------------------------------------------------------------------
--
Did I assume correctly that I should use the -tvc- option and therefore
trust those results? If so what about proportional hazards? I read the
chapter on time dependent variables in Collet's book "Modelling Survival
Data in Medical Research" and I understood him to say that when a variable X
depends on time t, the relative hazard is also time dependent. Therefore
the hazard of death at time t is no longer proportional to the baseline
hazard, and the model is no longer a proportional hazards model. So I
interpret this to mean that whenever one uses -tvc- the assumption of
proportional hazards is always violated. Is this correct? If so, why would
-stphtest- indicate that there was no violation?
-------------------------------------------
Joseph Wagner, MPH
130 Desoto Street
Parran Hall Room 134
Epidemiology Data Center
Graduate School of Public Health
University of Pittsburgh
Pittsburgh, PA 15261
PHONE:(412) 624-5295
FAX: (412) 624-3775
Email: [email protected]
http://www.edc.gsph.pitt.edu
-------------------------------------------
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/