Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Different types of missing data and MI
From
Austin Nichols <[email protected]>
To
[email protected]
Subject
Re: st: Different types of missing data and MI
Date
Mon, 13 Jun 2011 20:50:09 -0400
Clyde Schechter <[email protected]>:
If it's true that your covariate falling below the detection limit is
not predictable from other covariates or outcomes, then it is
apparently orthogonal and can be omitted with no effect; if that's
true, then various imputation schemes should also produce essentially
the same estimates for other coefs. You might estimate a -tobit- of
the 1/3 missing covariate on other covariates (and possibly the
outcome) and predict based on xb and a random draw from the error
distribution; a detection limit is one of the few instances in which
-tobit- works really well in simulations. You can also omit the
covariate; you can also replace with the detection limit, or zero, and
include a dummy for missing (all known to be problematic in some
cases, but not yours). There are more options than these four, but if
these four produce similar results, you have a very good footnote for
whatever table makes the final cut: the results are invariant to this
choice.
On Mon, Jun 13, 2011 at 7:20 PM, Clyde Schechter
<[email protected]> wrote:
> My problem is a third kind of missing data. One of the covariates is the
> result of a lab assay that has a lower limit of detectability. So these
> data are not missing in the full sense, rather they are left censored at
> the lower limit of detectability (or, more properly, interval censored
> between zero and the lower limit of detectability). I don't know what to
> do with these. -mi- doesn't seem applicable since these are certainly not
> missing at random. And any way I can think of to try to impute values
> here strikes me as inherently invalid because it appears that the data
> simply do not contain any information whatsoever about the relationship
> between this variable and the outcome (or anything else) in the
> undetectable range. And I don't know of any analytic methods that handle
> interval-censored independent variables.
>
> For now, because the lower limit of detectability is close to zero, and
> because analyses and graphical explorations excluding these cases suggest
> that this variable is not associated with the outcome anyhow, I've done an
> analysis where I simply recode these particular values as zero. But I
> can't escape the feeling that this is not really defensible.
>
> There are two alternatives I would prefer. One is to simply omit these
> cases altogether--but there are a lot of them, about a third of the
> sample, and it would leave us rather underpowered. The other is to just
> drop this variable (especially since it doesn't seem to be associated with
> the outcome anyway, at least outside the censoring range)--but the
> variable was actually identified in our study aims as one of the key
> predictors of interest. (I guess we weren't very prescient!)
>
> Any advice would be appreciated. Thanks in advance
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/