I am always queasy when expected to approve,
or invited to disapprove, a proposed analysis.
How can anyone give a really worthwhile opinion
of what is sensible for someone's project on
this amount of information?
Nevertheless the notion of throwing away
half the data on this basis is rather alarming.
I have found cube roots often useful for non-negative
variables. This is partly empirical, partly that
zero goes to zero, but there is also an arm-waving basis
that cube roots work well for gamma distributions (cf. the
Wilson-Hilferty transformation). More generally,
powers falling towards zero in effect have the logarithm
as their limit.
Nick
[email protected]
Daniel Waxman
> Maarten, Kevin,
>
> Thank you very much for your replies. So for now I am just
> going give up
> trying to make distributional assumptions and to drop the half of the
> observations which are zero or non-detectable prior to log
> transforming the
> predictor and to creating the logistic model. In fact,
> whether I do this or
> change the zero to half of the lowest detectable value (i.e.
> .005) doesn't
> have much of an effect on the logistic odds ratio.
>
> If anybody has any objections to this (or sees how a
> statistical reviewer
> for a medical journal might have objections), please let me know.
>
> Daniel
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of maartenbuis
> Sent: Sunday, June 05, 2005 7:38 PM
> To: [email protected]
> Subject: Re: st: how to deal with censoring at zero (a lot of
> zeroes) for a
> laboratory re
>
> I am tired: The cirtical assumption behind Multiple Imputation is that
> the probability of missingness does not depend on the value of the
> missing variable itself (Missing At Random, or MAR). This is obviously
> not the case with censoring. My objection against (conditional) mean
> imputation, and my remark about selecting on the independent variables
> still hold. So, given that you have a large number of observations, I
> would just ignore the zero observations.
>
> Maarten
>
> --- In [email protected], "maartenbuis"
> <maartenbuis@y...> wrote:
> > Hi Daniel,
> >
> > It looks to me like you could use -tobit- for log(tropin) and just a
> > constant. The predicted values should give you the
> extrapolations you
> > want. (This will be the same value for all missing observations: the
> > mean of the log-normal distribution conditional on being
> less than the
> > censoring value)
> >
> > However, These are actually missing values, and apperently
> you want to
> > create imputations for them. If you just use the values you obtained
> > from -predict- you will be assuming that you are as sure about these
> > values as you are about the values you actually observed,
> and thus get
> > standard errors that are too small. If you really want to
> impute, than
> > you could have a look at -mice- (findit mice). Alternatively, you
> > could use the results from -tobit- to generate multiple imputations.
> > Mail me if you want to do that, and I can write, tonight or
> tomorrow,
> > an example for the infamous auto dataset. However, censoring on the
> > independent variable is generally much less a problem than censoring
> > on the dependent variable, so ignoring (throwing away) the censored
> > observation, should not lead to very different estimates.
> >
> > HTH,
> > Maarten
> >
> > --- "Daniel Waxman" <dan@a...> wrote:
> > > I am modeling a laboratory test (Troponin I) as an independent
> > > (continuous) predictor of in-hospital mortality in a sample of
> > > 10,000 subjects. <snip> The problem is the zero values,
> what they
> > > represent, and what to do with them. The distribution
> of results
> > > ranges from the minimal detectable level of .01 mcg/L to
> 94 mcg/L,
> > > with results markedly skewed to the left (nearly half the results
> > > are zero; 90% are < .20. results are given in increments
> > > of .01). Of course, zero is a censored value which represents a
> > > distribution of results between zero and somewhere below .01.
> > <snip.
> > > I found a method attributed to A.C. Cohen of doing
> essentially this
> > > which uses a lookup table to calculate the mean and standard
> > > deviation of an assumed log-normal distribution based upon the
> > > non-censored data and the proportion of data points that are
> > > censored, but there must be a better way to do this in Stata.
> > >
> > > Any thoughts on (1) whether it is reasonable to assume the
> > > log-normal distribution (I've played with qlognorm and
> plognorm, but
> > > it's hard to know what is good enough), and if so (2) how
> to do it?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/