Thank you.
I've discovered that the 'mfp' program (multivariable fractional
polynomials) has a convenient 'zerocat' option, which basically automates
the process of converting the zeroes to a separate binary predictor before
fitting the model. Very useful!
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Arnold Kester
Sent: Wednesday, June 08, 2005 9:09 AM
To: [email protected]
Subject: Re: st: how to deal with censoring at zero (a lot of zeroes) for a
laboratory re
Op 06/06/2005 01:17 PM schreef Daniel Waxman:
> Maarten, Kevin,
>
> Thank you very much for your replies. So for now I am just going give up
> trying to make distributional assumptions and to drop the half of the
> observations which are zero or non-detectable prior to log transforming
the
> predictor and to creating the logistic model. In fact, whether I do this
or
> change the zero to half of the lowest detectable value (i.e. .005) doesn't
> have much of an effect on the logistic odds ratio.
>
> If anybody has any objections to this (or sees how a statistical reviewer
> for a medical journal might have objections), please let me know.
If you drop observations based on their value of a predictor variable
you are in fact changing the protocol of your study. The inclusion
criteria are changed to include "Troponin I is detectable". Results
would be valid for people with detectable values only.
If you want to get a prediction for undetectable Troponin without
assuming a specific value you could add a dummy variable troponin_zero =
(troponin == 0) and substitute (say) zero for log(troponin) when
troponin==0. The predicted value from this model is independent of what
you choose for "log(0)".
Arnold
>
> Daniel
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of maartenbuis
> Sent: Sunday, June 05, 2005 7:38 PM
> To: [email protected]
> Subject: Re: st: how to deal with censoring at zero (a lot of zeroes) for
a
> laboratory re
>
> I am tired: The cirtical assumption behind Multiple Imputation is that
> the probability of missingness does not depend on the value of the
> missing variable itself (Missing At Random, or MAR). This is obviously
> not the case with censoring. My objection against (conditional) mean
> imputation, and my remark about selecting on the independent variables
> still hold. So, given that you have a large number of observations, I
> would just ignore the zero observations.
>
> Maarten
>
> --- In [email protected], "maartenbuis" <maartenbuis@y...> wrote:
>
>>Hi Daniel,
>>
>>It looks to me like you could use -tobit- for log(tropin) and just a
>>constant. The predicted values should give you the extrapolations you
>>want. (This will be the same value for all missing observations: the
>>mean of the log-normal distribution conditional on being less than the
>>censoring value)
>>
>>However, These are actually missing values, and apperently you want to
>>create imputations for them. If you just use the values you obtained
>>from -predict- you will be assuming that you are as sure about these
>>values as you are about the values you actually observed, and thus get
>>standard errors that are too small. If you really want to impute, than
>>you could have a look at -mice- (findit mice). Alternatively, you
>>could use the results from -tobit- to generate multiple imputations.
>>Mail me if you want to do that, and I can write, tonight or tomorrow,
>>an example for the infamous auto dataset. However, censoring on the
>>independent variable is generally much less a problem than censoring
>>on the dependent variable, so ignoring (throwing away) the censored
>>observation, should not lead to very different estimates.
>>
>>HTH,
>>Maarten
>>
>>--- "Daniel Waxman" <dan@a...> wrote:
>>
>>>I am modeling a laboratory test (Troponin I) as an independent
>>>(continuous) predictor of in-hospital mortality in a sample of
>>>10,000 subjects. <snip> The problem is the zero values, what they
>>>represent, and what to do with them. The distribution of results
>>>ranges from the minimal detectable level of .01 mcg/L to 94 mcg/L,
>>>with results markedly skewed to the left (nearly half the results
>>>are zero; 90% are < .20. results are given in increments
>>>of .01). Of course, zero is a censored value which represents a
>>>distribution of results between zero and somewhere below .01.
>>
>><snip.
>>
>>>I found a method attributed to A.C. Cohen of doing essentially this
>>>which uses a lookup table to calculate the mean and standard
>>>deviation of an assumed log-normal distribution based upon the
>>>non-censored data and the proportion of data points that are
>>>censored, but there must be a better way to do this in Stata.
>>>
>>>Any thoughts on (1) whether it is reasonable to assume the
>>>log-normal distribution (I've played with qlognorm and plognorm, but
>>>it's hard to know what is good enough), and if so (2) how to do it?
>>
>>
>>
>>
>>*
>>* For searches and help try:
>>* http://www.stata.com/support/faqs/res/findit.html
>>* http://www.stata.com/support/statalist/faq
>>* http://www.ats.ucla.edu/stat/stata/
>
>
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
--
Met vriendelijke groet,
Arnold Kester
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/