FAQ: Baseline hazard and baseline hazard contribution

Home / Resources & support / FAQs / Baseline hazard and baseline hazard contribution

Note: The following question and answer is based on an exchange that started on Statalist.

What is the relationship between baseline hazard and baseline hazard contribution?

Title		Baseline hazard and baseline hazard contribution
Author		William Gould, StataCorp

Question:

In Stata’s stcox model, I’ve noticed that it is now possible to obtain nonparametric estimates of the contribution to the baseline hazard (through the basehc() option in Stata 7 to 10 or through the postestimation command predict, basehc since Stata 11), but it is no longer possible to get nonparametric estimates of the baseline hazard itself (which used to be available through the basehazard() option in Stata 6). After reading Kalbfleisch and Prentice, I’m wondering if there is some equivocation in the use of the word “baseline” here. What is the relationship between baseline hazard and baseline hazard contribution?

Answer:

Yes, indeed there is some equivocation.

First, what used to be returned by the old (Stata 6) basehazard() option is exactly what was returned by the basehc() option in versions 7–10 and is created now by the postestimation command predict with the option basehc.

The problem was that what was returned by the old basehazard() option was not (and what is returned by the new basehc() option is not) the baseline hazard; it is the numerator of the baseline hazard, called the hazard contribution by Kalbfleisch and Prentice (2002, p. 115, eq. 3–34). To convert what is returned to a baseline hazard, you could divide it by Delta_t, the time between failures. But don’t do that. I did some simulations and quickly convinced myself that dividing by Delta_t is a poor estimator of the baseline hazard. Results are much better if the estimate is based on the cumulative hazard, using smoothing followed by numerical differentiation techniques.

The command stcurve calculates and plots the smoothed hazard estimate. By default, stcurve plots the estimate at the means of the covariates:

. sysuse cancer, clear
 (Patient survival in drug trial)
    
 . stset studytime, failure(died)

Survival-time data settings

         Failure event: died!=0 & died<.
Observed time interval: (0, studytime]
     Exit on or before: failure



         48  total observations
          0  exclusions

         48  observations remaining, representing
         31  failures in single-record/single-failure data
        744  total analysis time at risk and under observation
                                                At risk from t =         0
                                     Earliest observed entry t =         0
                                          Last observed exit t =        39


. stcox drug age, nolog

        Failure _d: died
  Analysis time _t: studytime

Cox regression with Breslow method for ties

No. of subjects =  48                                   Number of obs =     48
No. of failures =  31
Time at risk    = 744
                                                        LR chi2(2)    =  36.29
Log likelihood = -81.765061                             Prob > chi2   = 0.0000



          _t   Haz. ratio   Std. err.      z    P>|z|     [95% conf. interval]
    

        drug     .2153648   .0676904    -4.89   0.000     .1163154    .3987605
         age     1.116351   .0403379     3.05   0.002     1.040025    1.198279



. stcurve, hazard

The command stcurve is using kernel density estimation to perform the smoothing we referred to above. We can do this by hand using the baseline hazard contributions and the command kdensity to perform the smoothing:

. sysuse cancer
(Patient survival in drug trial)

. stset studytime, failure(died)

Survival-time data settings

         Failure event: died!=0 & died<.
Observed time interval: (0, studytime]
     Exit on or before: failure



         48  total observations
          0  exclusions

         48  observations remaining, representing
         31  failures in single-record/single-failure data
        744  total analysis time at risk and under observation
                                                At risk from t =         0
                                     Earliest observed entry t =         0
                                          Last observed exit t =        39


. stcox drug age, nolog

        Failure _d: died
  Analysis time _t: studytime

Cox regression with Breslow method for ties

No. of subjects =  48                                   Number of obs =     48
No. of failures =  31
Time at risk    = 744
                                                        LR chi2(2)    =  36.29
Log likelihood = -81.765061                             Prob > chi2   = 0.0000



          _t   Haz. ratio   Std. err.      z    P>|z|     [95% conf. interval]
    

        drug     .2153648   .0676904    -4.89   0.000     .1163154    .3987605
         age     1.116351   .0403379     3.05   0.002     1.040025    1.198279



. predict hc0, basehc
(17 missing values generated)

. sum drug


    Variable          Obs        Mean    Std. dev.       Min        Max
    

        drug           48       1.875    .8410986          1          3


. replace drug=r(mean)
variable drug was byte now float
(48 real changes made)

. sum age


    Variable          Obs        Mean    Std. dev.       Min        Max
    

         age           48      55.875    5.659205         47         67


. replace age=r(mean)
variable age was byte now float
(48 real changes made)

. predict double xb, xb

. gen double hcmean = (1-(1-hc0)^exp(xb))
(17 missing values generated)

. drop if hc0==.
(17 observations deleted)

. sort _t

. by _t: keep if _n==1
(10 observations deleted)

. summ _t, meanonly

. local tmin = r(min)

. local tmax = r(max)

. local N = _N

. local N1 = `N' + 1

. local obs = `N'+101

. set obs `obs'
Number of observations (_N) was 21, now 122.

. gen t0 = `tmin' + (`tmax'-`tmin')*(_n-`N1')/100 in `N1'/l
(21 missing values generated)

. gen t1 = t0 if t0>=4.62 & t0<=28.38
(48 missing values generated)

. kdensity _t [iweight=hcmean] if _d, at(t1) generate(hmean) nograph

. twoway line hmean t1, ytitle("")
>                xtitle("analysis time")
>                title("Smoothed hazard estimate")

We can see that stcurve is doing a lot of work for us. First, it obtains the means of the covariates and calculates the hazard contributions at the mean. Next, it creates 101 equally spaced time points at which to calculate the smoothed hazard estimate. Finally, it uses kdensity to do the smoothing.

Reference

Kalbfleisch, J. D., and R. L. Prentice. 2002.: The Statistical Analysis of Failure Time Data. 2nd ed. New York: Wiley.

What is the relationship between baseline hazard and baseline hazard contribution?

Question:

Answer:

Reference

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies


_t		Haz. ratio Std. err. z P>\|z\| [95% conf. interval]

drug		.2153648 .0676904 -4.89 0.000 .1163154 .3987605
age		1.116351 .0403379 3.05 0.002 1.040025 1.198279

Variable		Obs Mean Std. dev. Min Max

age		48 55.875 5.659205 47 67

Stata/MP4 Annual License (download)

What is the relationship between baseline hazard and baseline hazard contribution?

Question:

Answer:

Reference

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies