Svend,
Thanks. I did indeed look extensively as the predictor as a categorical
variable and as a predictor when .005 is used. My dataset is large enough,
and events common enough, that the confidence intervals are quite small at
the .01 level. There is a threshold, but it is below .01. In other words,
there is no measurable change in outcome between .01 and .02, but there is
one between 'undetectable' and .01.
Zero could be .005, but it could be .0005 or .00005. (biologically speaking
as well) I suppose this becomes irrelevant very soon though if it can't be
measured. However, the logistic equation suggests (given the measured # of
deaths at the zero value) that the zero should be approximately .001.
It seems that this is a common issue in the environmental literature, where
people care a lot about very small concentrations of things (lead, arsenic,
etc.) I have found various sources that suggest that the method of Cohen
(mentioned below) of estimating the entire distribution curve by using the
available points and the known or assumed shape can be preferable to picking
half of the lower limit arbitrarily.
Daniel
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Svend Juul
Sent: Sunday, June 05, 2005 9:47 AM
To: [email protected]
Subject: RE: st: how to deal with censoring at zero (a lot of zeroes) for a
laboratory result which I would like to log transform
Daniel,
You wonder how to handle zero values in a predictor you have
good reasons to log-transform.
For a first look I would make a reasonable categorization of the
predictor, e.g. five categories (0, 0.01-0.09, 0.10-0.99, 1-10, 10+)
and use -xi: logistic- to see the pattern. This analysis might also
give an idea whether there is some threshold.
If this justifies using a log-transform, I think you almost give
the answer yourself: zero means a result somewhere between 0 and
0.01. So why not select 0.005, log-transform, and run -logistic-
using the log-transformed predictor.
The idea to let the data determine the "best" value that the zeros
represent has its problems: The confidence interval for the odds
ratio estimate becomes too small.
Hope this helps
Svend
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/