I am tired: The cirtical assumption behind Multiple Imputation is that
the probability of missingness does not depend on the value of the
missing variable itself (Missing At Random, or MAR). This is obviously
not the case with censoring. My objection against (conditional) mean
imputation, and my remark about selecting on the independent variables
still hold. So, given that you have a large number of observations, I
would just ignore the zero observations.
Maarten
--- In [email protected], "maartenbuis" <maartenbuis@y...> wrote:
> Hi Daniel,
>
> It looks to me like you could use -tobit- for log(tropin) and just a
> constant. The predicted values should give you the extrapolations you
> want. (This will be the same value for all missing observations: the
> mean of the log-normal distribution conditional on being less than the
> censoring value)
>
> However, These are actually missing values, and apperently you want to
> create imputations for them. If you just use the values you obtained
> from -predict- you will be assuming that you are as sure about these
> values as you are about the values you actually observed, and thus get
> standard errors that are too small. If you really want to impute, than
> you could have a look at -mice- (findit mice). Alternatively, you
> could use the results from -tobit- to generate multiple imputations.
> Mail me if you want to do that, and I can write, tonight or tomorrow,
> an example for the infamous auto dataset. However, censoring on the
> independent variable is generally much less a problem than censoring
> on the dependent variable, so ignoring (throwing away) the censored
> observation, should not lead to very different estimates.
>
> HTH,
> Maarten
>
> --- "Daniel Waxman" <dan@a...> wrote:
> > I am modeling a laboratory test (Troponin I) as an independent
> > (continuous) predictor of in-hospital mortality in a sample of
> > 10,000 subjects. <snip> The problem is the zero values, what they
> > represent, and what to do with them. The distribution of results
> > ranges from the minimal detectable level of .01 mcg/L to 94 mcg/L,
> > with results markedly skewed to the left (nearly half the results
> > are zero; 90% are < .20. results are given in increments
> > of .01). Of course, zero is a censored value which represents a
> > distribution of results between zero and somewhere below .01.
> <snip.
> > I found a method attributed to A.C. Cohen of doing essentially this
> > which uses a lookup table to calculate the mean and standard
> > deviation of an assumed log-normal distribution based upon the
> > non-censored data and the proportion of data points that are
> > censored, but there must be a better way to do this in Stata.
> >
> > Any thoughts on (1) whether it is reasonable to assume the
> > log-normal distribution (I've played with qlognorm and plognorm, but
> > it's hard to know what is good enough), and if so (2) how to do it?
>
>
>
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/