Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nicole Boyle <nicboyle@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Dropping right-censored spells in the Cox model |
Date | Sat, 14 Sep 2013 18:28:08 -0700 |
Hi Kai, I'll take a stab at this. I'm going to assume that those "censored spells" you're considering dropping do not include those that are censored due to the competing risk, but only include noninformative censorship: For subjects censored at time "c," probability of survival past some future time t>c is the same as probability of survival past t for subjects who were merely known to have survived past c. I'm also assuming those "censored spells" you're referring to don't include administratively censored subjects. Given the assumption of noninformative censorship in conjunction with a central property of Cox regression, I do NOT think dropping censored observations is a good idea. Important to Cox regression is the maximum partial-likelihood estimator property, which dictates that hazard estimates are computed only at times of failure, using the risk pools available at each of these failure times for these computations. All those contributing to the "at risk" pool at time=t are not weighted any differently at time=t with regards to future failure or future censorship. Since subjects censored at time=t are considered just as likely to fail in the future as subjects NOT censored at time=t, and because censored subjects by definition NEVER fail in the data, dropping censored subjects from the study reduces the "# at risk" denominator from each failure time (where censored observations would have normally contributed to this denominator prior to censorship) while keeping the "# failed" numerator intact. This will very likely bias results. I suggest that instead of thinking about the data in terms of [total # failed] / [total # at risk], which is more of a risk ratio or odds ratio mentality, consider thinking of each failure time as an independently conducted prevalence estimate. _Here's an example to clarify_ Say you're conducting small study, and the ONLY data involves one instantaneous prevalence calculation: [# dead at this particular moment] / [# at risk for death at this particular moment] Now, imagine one month has gone by since you performed the calculation. A colleague (involved in a completely unrelated study) informs you that 50% of your original "at risk" sample has now just declined participation in one HIS/HER studies. If YOUR study only concerns the original prevalence calculation, would it make sense to drop those subjects unavailable to your colleague's study from your original prevalence calculation? Since their declination from your colleague's study has no bearing on whether they were dead or alive in YOUR study one month ago, it makes no sense to drop them. They contributed valuable information. But furthermore, removing them might be harmful. Only those subjects who were alive (not in the "# dead" numerator) in your prevalence calculation are the ONLY subjects in your study capable of declining study involvement one month later. DeclinationTherefore, eliminating these subjects BY DEFINITION biases your prevalence calculation. So, in conclusion: The likelihood of censorship at time=t is independent of the likelihood of failure at time>t. However, the likelihood of failure at time=t is NOT independent of the likelihood of censorship at time>t. If you discount subjects' past survival contributions based on future censorship, you will be at risk of artifactually "powering" your resulting HRs. Given the 50% censorship you've mentioned, this increased "power" will surely result when dropping this many subjects, and may result in more "powerful" HRs or even a change in HR direction. Nicole On Sat, Sep 14, 2013 at 7:37 AM, Kai Huang <demonsecret@hotmail.com.hk> wrote: > Dear all, > > I have estimated competing-risks Cox models on unemployment spells. The estimates are not very significant probably due to the small number of spells. I wonder whether the high proportion of censored spells in the model (50%) matter. Does it make sense if I drop all the right-censored spells and estimate the models with all spells being completed? Thank you very much in advance. > > Best regards, > Kai Huang > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/