A while ago Alan Acock asked a question on interactions in an imputation model: <http://www.stata.com/statalist/archive/2009-02/msg00602.html>.
The main issue was that there is an emerging literature claiming that one
should not use the -passive()- option in -ice- (see, -ssc d ice-), but
instead create interactions, squares etc. in the un-imputed data, and impute these as if they were normal variables:
John Graham (2009) "Missing Data Analysis: Making it Work in the Real World", Annual Review of Psychology, 60:549-576.
Paul von Hippel (2009) "How to impute interactions, squares, and other transformed variables", Sociological Methodology, 39:265-291.
Alan wrote:
> ice allows us to passively estimate an interaction term by estimating
> the main effects and then multiplying these together so the interaction
> of X&Y will be the imputed X times the imputed Y. This seems necessary
> to preserve the interpretation of the interaction.
>
> Graham says we need to include the interaction term. "The problem with
> excluding such variables from the imputation model is that all
> imputation is done under the assumption that the correlation is r = 0
> between the omitted variable and all other variables in the imputation."
> This is the same argument that Graham makes for imputing the dependent
> variable in the imputation (a sensible thing to do).
>
> I understand the importance of including the dependent variable when
> doing multiple imputations, and see how Graham could apply this to the
> interaction term, but it makes no sense to me to have an interaction of
> X and Y not equal X*Y.
Paul Allison responded:
> Graham is right. In multiple imputation, interactions should be imputed
> as though they are additional variables, not constructed by multiplying
> imputed values. The same is true if you have x and x^2 in a model. The
> x^2 term should be imputed just like any other variable, not constructed
> by squaring the imputed values of x. While this principle may seem
> counterintuitive, it is easily demonstrated by simulation that the more
> "natural" way to do it produces biased estimates.
I was skeptical and tried to do that simulation. I did not have much time,
and I did not get the simulation right. I still posted it, in case my
first attemp at a solution might be helpful to someone.
Right now I am about to start a new imputation project, so I thought it
was time to take this subject on again. I rewrote the simulation and ran it. This time the results seem more reasonable. It supported the claim by
von Hippel and Graham, and showed -passive()- really seems to introduce
some bias, and that first transforming and than imputing really reduces
it. The true interaction effect was 1, with -passive()- it had a bias of
-.14 (MC standard error = .0007), while the bias reduced to -.007 (MC
standard error = .0002) without -passive()-.
To run this simulation one needs: 1) a couple of hours, 2) -ice- (ssc install ice-), 3) -mim- (ssc install mim-), 4) -simsum- (see this talk at the last UK Stata Users' meeting: <http://ideas.repec.org/p/boc/usug09/08.html>)
*-------------------- begin simulation -----------------------
set more off
program drop _all
program define sim, rclass
drop _all
matrix C = (1, .25, .25 \ .25, 1, .25 \ .25, .25, 1)
drawnorm x1 x2 x3, n(250) corr(C)
gen x12= x1*x2
gen y = x1 + x2 + x3 + x12 + .25*rnormal()
replace x1 = . if runiform() < invlogit(-2 - y + x3)
replace x2 = . if runiform() < invlogit(-2 - y + x3)
ice y x1 x2 x3 x12, m(5) clear passive(x12:x1*x2)
mim, storebv : reg y x1 x2 x3 x12
return scalar b = _b[x1]
return scalar se = _se[x1]
return scalar b12 = _b[x12]
return scalar se12 = _se[x12]
keep if _mj ==0
drop _m*
ice y x1 x2 x3 x12, m(5) clear
mim, storebv : reg y x1 x2 x3 x12
return scalar hb = _b[x1]
return scalar hse = _se[x1]
return scalar hb12 = _b[x12]
return scalar hse12 = _se[x12]
end
timer clear 1
timer on 1
simulate b=r(b) se=r(se) b12=r(b12) se12=r(se12) ///
hb=r(hb) hse=r(hse) hb12=r(hb12) hse12 = r(hse12), ///
reps(10000) : sim
timer off 1
timer list
simsum b hb, true(1) se(se hse) mcse
simsum b12 hb12 , true(1) se(se12 hse12) mcse
*------------------------- end simulation ------------------------
( For more on how to use examples I sent to statalist see:
http://www.maartenbuis.nl/stata/exampleFAQ.html )
Hope this helps,
Maarten
--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany
http://www.maartenbuis.nl
--------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/