Dear Daniel:
I also wonder whether you could model the data as a truncated
("left-censored") continuous distribution, where you cannot observe observe
the values "below zero."
This might help simplify the challenge of getting distributional assumptions
right when assuming the variable is strictly non-negative.
Kevin.
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Daniel Waxman
Sent: Saturday, June 04, 2005 8:53 PM
To: [email protected]
Cc: 'Gregg Husk'
Subject: st: how to deal with censoring at zero (a lot of zeroes) for a
laboratory result which I would like to log transform
Hello,
I have a problem which I believe must be commonly encountered, and which
must have a simple solution in Stata, but I just can't see it.
I am modeling a laboratory test (Troponin I) as an independent (continuous)
predictor of in-hospital mortality in a sample of >10,000 subjects. A
simple model seems to fit well: In a logistic model, the odds ratio for the
log-transformed result (dropping the zeroes, or making something up) remains
relatively constant with whatever I throw in with it.
The problem is the zero values, what they represent, and what to do with
them. The distribution of results ranges from the minimal detectable level
of .01 mcg/L to 94 mcg/L, with results markedly skewed to the left (nearly
half the results are zero; 90% are < .20. results are given in increments
of .01). Of course, zero is a censored value which represents a
distribution of results between zero and somewhere below .01.
I noted that plotting N vs. log(troponin) is for all practical purposes
linear at the lowest measurable concentrations. It seems to me that if the
distribution of results were predictable, then I should be able to
extrapolate back to what the best point estimate for the zero would be.
(and I could then compare this value to that predicted by the reversed logit
equation and the known fraction of deaths at measured value of zero).
I believe that it might be reasonable to assume a log-normal distribution of
results, but I am not sure about this.
I found a method attributed to A.C. Cohen of doing essentially this which
uses a lookup table to calculate the mean and standard deviation of an
assumed log-normal distribution based upon the non-censored data and the
proportion of data points that are censored, but there must be a better way
to do this in Stata.
Any thoughts on (1) whether it is reasonable to assume the log-normal
distribution (I've played with qlognorm and plognorm, but it's hard to know
what is good enough), and if so (2) how to do it?
Thanks.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/