Emelda Okiro <[email protected]> wonders why her residual
analysis after -stcox- is failing:
> This is what my data looks like
[listing of multiple observation per subject data]
> . stset date_visit, id (rsv) failure(lrti) enter(time
> date_origin)origin(time d(31jan2002)) exit(time date_exit) scale(1)
[output omitted]
> . quietly xi: stcox i.currentagegrp sex i.siblings_un6 i.main_fuel
> i.hse_toilet i.babies_bor i.education i.family_children
> i.interaction_un6 i.siblingssch_un6 i.siblingsroom_ov6 i.female_sibs
> poor i.weaning i.job_desc, nohr mgale(mg)
> . predict cs, csnell
> . stset cs, failure(lrti)
which then signals a PROBABLE ERROR.
When you have multiple observations per subject, you need a residual that
is subject specific, in this case a residual that is summed over the
observations within the subject. These are known as cumulative Cox-Snell
residuals, and are obtained by option -ccsnell- to -predict- (note the
extra c in ccsnell).
Once you get the cumulative Cox-Snell residuals, you can proceed with
-stset-ting them and so on. When you -stset- them, you can just ignore the
new PROBABLE ERROR message that you get due to missing values in the
newly-created variable (cumulative Cox-Snell residuals are only recorded
in one observation per subject).
To answer Nick's question, you -stset- Cox-Snell residuals so that you can
-sts generate- an estimated cumulative hazard for these residuals. You see,
under a good-fitting model, the Cox-Snell residuals are distributed as unit
exponential for which the theoretical cumulative hazard is the identity
function, H(t) = t. As such, if you plot the estimated cumulative hazard of
the Cox-Snell residuals, it _should_ look like a 45-degree line.
--Bobby
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/