I would like to report what I believe is an erroneous result that is
generated by Stata 7.
After the estimation of a Cox proportional hazard model (stcox), I obtain
a baseline cumulative hazard function that is NOT monotonically increasing
with time. In particular, I obtain:
_t H
59 1083 .00022798
60 1152 .00023376
61 1165 .00023946
62 1170 .00024571
63 1191 .00001006
64 1191 .00001006
65 1212 .00026592
66 1217 .00027271
67 1235 .00027947
where H is the cumulative baseline hazard.
As we can observe in observations 63 and 64, the cumulative baseline
hazard H decreases.
These results are obtained by estimating a model on the Primary Biliary
Cirrhosis, sequential data from the Mayo Clinic. I obtained the data from
Terry Therneau's webpage.
I replicate the model at page 269 of Therneau and Grambsch's Modeling
Survival Data.
stcox age1 lnbili lnprotime lnalbumin edema, efron nohr basechazard(H)
Cox regression -- Efron method for ties
No. of subjects = 312 Number of obs = 1945
No. of failures = 140
Time at risk = 730592
LR chi2(5) = 474.21
Log likelihood = -489.4345 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
_t |
_d | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age1 | .0460066 .0089065 5.17 0.000 .0285501 .063463
lnbili | 1.084906 .1111168 9.76 0.000 .8671211 1.302691
lnprotime | 2.848418 .6316613 4.51 0.000 1.610384 4.086451
lnalbumin | -3.719251 .4952839 -7.51 0.000 -4.68999 -2.748512
edema | .8059015 .2327012 3.46 0.001 .3498155 1.261987
------------------------------------------------------------------------------
The problem occurs three times in this data set, and it occurs in the presence
of tied failure events. If I plot the cumulative hazard function H against
time, I obtain the characteristic plot but with three spikes pointing
towards the x-axis.
This problem doesn't occur if I estimate the same model using the breslow
method for tied events.
I would be really grateful if anyone could clarify whether this is a bug
or just my misunderstanding.
Thank you very much for your attention,
giacomo
Here's the do-file that generates the data set and the analysis:
clear
infile id fudays status drug age sex day ascites hepatom spiders edema /*
*/ bili chol albumin alkphos sgot /*
*/ platelet protime stage using "C:\Data\pbcseq0.txt"
drop in 1
sort id day
by id: gen stop=day[_n+1]
replace stop=fudays if stop==.
gen start=day
gen event=0
replace event=status if stop==fudays
sort id day
order id start stop event
outsheet using "C:\My Documents\leaders\data\pbcseq.txt", replace
save "C:\My Documents\leaders\data\pbcseq.dta", replace
clear
use "C:\My Documents\leaders\data\pbcseq.dta"
stset stop, enter(start) id(id) failure(event==2)
gen age1=age/365.25
gen lnbili=ln(bili)
gen lnprotime=ln(protime)
gen lnalbumin=ln(albumin)
stcox age1 lnbili lnprotime lnalbumin edema, efron nohr basechazard(H)
gsort -event _t
graph H _t, sort c(J)
stcox age1 lnbili lnprotime lnalbumin edema, nohr basechazard(Hb)
graph Hb _t, sort c(J)
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/