Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Clyde B Schechter <clyde.schechter@einstein.yu.edu> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | st: Situation where multiple imputation may be of no use? |
Date | Thu, 9 Feb 2012 22:06:16 +0000 |
This is a question of a statistical nature about what multiple imputation can accomplish. I have used MI a few times, and I have a general understanding of how it works and the underlying theory, but not in great depth. I'm working with a colleague to plan an experiment. This description is oversimplified but, I believe, provides the essence of it. Subjects will be enrolled and baseline data obtained. They will then be randomly assigned to intervention or placebo groups. After enough time for the intervention to work has elapsed, the outcome, a continuous variable, will be assessed, once and only once. Based on some preliminary studies, we expect that about 15-20% of the participants will not return for the outcome assessment. Given our fairly small anticipated effect size (due mostly to noise in the outcome assessment that we can't think of any way to reduce with available technology), the sample size we need to adequately power our study is, as it turns out, about 20% greater than we will be able to manage within budget. So, if there were no losses to follow-up, we'd be just OK. But there will be losses to follow-up, and efforts to reduce that will also eat into the budget. (As would getting two outcome assessments and using the average or doing a mixed model.) So my colleague has suggested that when we analyze our data we use multiple imputation to make up for the missing data. I'm by no means opposed to doing that, but I don't think it will help us with regard to statistical power. I understand that MI lets you squeeze all the information that is really there in the existing data set, and can even correct some of the bias that can result using listwise deletion. But in our case, the only missing data will be the outcome measurement. We will have complete data on everything else. So it seems to me, that MI in this context will just amount to carrying out a listwise-deletion analysis, and multiply extrapolating the results of that to the cases with missing outcome, and the combining the analyses of the imputed data sets in a way that reflects the between-imputed-samples variation. If I am thinking about this correctly, the added variance from the multiple imputations should pretty much balance the reduction in standard error that comes from (appearing to) use the full sample size. If this were not true, then MI would be synthesizing information ex nihilo. So, my instincts tell me that we will not solve our statistical power problem by using MI anal! ysis. I have run a few simulations, and they support my opinion, but I wanted to run this by some people who understand MI better than I do. Thanks in advance for any help. Clyde Schechter Department of Family & Social Medicine Albert Einstein College of Medicine Bronx, New York, USA * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/