Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Situation where multiple imputation may be of no use?
From
Richard Williams <[email protected]>
To
[email protected], "[email protected]" <[email protected]>
Subject
Re: st: Situation where multiple imputation may be of no use?
Date
Thu, 09 Feb 2012 17:24:58 -0500
At 05:06 PM 2/9/2012, Clyde B Schechter wrote:
This is a question of a statistical nature about what multiple
imputation can accomplish.
I have used MI a few times, and I have a general understanding of
how it works and the underlying theory, but not in great depth.
I'm working with a colleague to plan an experiment. This
description is oversimplified but, I believe, provides the essence
of it. Subjects will be enrolled and baseline data obtained. They
will then be randomly assigned to intervention or placebo
groups. After enough time for the intervention to work has elapsed,
the outcome, a continuous variable, will be assessed, once and only once.
Based on some preliminary studies, we expect that about 15-20% of
the participants will not return for the outcome assessment. Given
our fairly small anticipated effect size (due mostly to noise in the
outcome assessment that we can't think of any way to reduce with
available technology), the sample size we need to adequately power
our study is, as it turns out, about 20% greater than we will be
able to manage within budget. So, if there were no losses to
follow-up, we'd be just OK. But there will be losses to follow-up,
and efforts to reduce that will also eat into the budget. (As would
getting two outcome assessments and using the average or doing a
mixed model.) So my colleague has suggested that when we analyze
our data we use multiple imputation to make up for the missing
data. I'm by no means opposed to doing that, but I don't think it
will help us with regard to statistical power.
I understand that MI lets you squeeze all the information that is
really there in the existing data set, and can even correct some of
the bias that can result using listwise deletion. But in our case,
the only missing data will be the outcome measurement. We will have
complete data on everything else. So it seems to me, that MI in
this context will just amount to carrying out a listwise-deletion
analysis, and multiply extrapolating the results of that to the
cases with missing outcome, and the combining the analyses of the
imputed data sets in a way that reflects the between-imputed-samples
variation. If I am thinking about this correctly, the added
variance from the multiple imputations should pretty much balance
the reduction in standard error that comes from (appearing to) use
the full sample size. If this were not true, then MI would be
synthesizing information ex nihilo. So, my instincts tell me that
we will not solve our statistical power problem by using MI anal!
ysis. I have run a few simulations, and they support my opinion,
but I wanted to run this by some people who understand MI better than I do.
In general, I don't think you gain much by imputing values of the
dependent variable. See
http://www.ats.ucla.edu/stat/stata/seminars/missing_data/mi_in_stata_pt1.htm
Excerpt: "One common question about imputation is whether the
dependent variable should be included in the imputation model. The
answer is yes, if the dependent variable is not included in the
imputation model, the imputed values will not have the same
relationship to the dependent variable that the observed values do.
In other words, if the dependent variable is not included in the
imputation model, you may be artificially reducing the strength of
the relationship between the independent and dependent variables.
After the imputations have been created, the issue of how to treat
imputed values of the dependent variable becomes more nuanced. If the
imputation model contains only those variables in the analysis model,
then using the imputed values of the dependent variable does not
provide additional information, and actually introduces additional
error (von Hippel 2007). As a result some authors suggest including
the dependent variable in the imputation model, which may include
imputing values, and then excluding any cases with imputed values for
the dependent variable from the final analysis (von Hippel 2007). If
the imputation was performed using auxiliary variables or if the
dataset was imputed without a specific analysis model in mind, then
using the imputed values of the dependent variable may provide
additional information. In these cases, it may be useful to include
cases with imputed values of the dependent variable in the analysis
model. Note that it is relatively easy to test the sensitivity of
results to the inclusion of cases with imputed values of the
dependent variable by running the analysis model with and without those cases."
A pre-publication version of the von Hippel paper is at
http://www.sociology.ohio-state.edu/people/ptv/publications/Missing%20Y/accepted.pdf
-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME: (574)289-5227
EMAIL: [email protected]
WWW: http://www.nd.edu/~rwilliam
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/