--- On Sun, 15/2/09, Paul Allison <[email protected]> wrote:
> Graham is right. In multiple imputation, interactions should
> be imputed as though they are additional variables, not
> constructed by multiplying imputed values. The same is true if
> you have x and x^2 in a model. The x^2 term should be imputed
> just like any other variable, not constructed by squaring
> the imputed values of x. While this principle may seem
> counterintuitive, it is easily demonstrated by simulation that
> the more "natural" way to do it produces biased estimates.
If Paul had said hat one should not impute y x z and after one
has created the imputed datasets create x2 by squaring x, then
I would have agreed immediately. However, this is not how the
-passive()- option works. As I understand it, the -passive()-
option implies that while other variables are imputed the full
model, including interactions, polynomial terms, etc., is used.
Only when during the Gibbs sample for example a square terms is
imputed, is the knowledge about the deterministic relationship
between the variables used. So, the imputation model does
include the non-linearity / interaction terms but it also
respects the deterministic relationship between interaction
terms, polynomial terms, etc. So I expected it to be superior
to a model that adds noise where none exists (e.g. in the
relationship between x and x square). So I took up Paul's
challenge and created the simulation below.
The procedure proposed by Paul does seem to result in biased
estimates, but the -passive()- option seems to perform worse.
This is unexpected for me, as the imputation model is exactly
correct for this data (I created the data that way), and in
the past I got simulations showing unbiased estimates from
-ice- models, so I expected that at least one of the two would
be unbiased. This suggest to me that there is an error in my
simulation, but I can't find it. I also sent this message to
Patrick Royston, who is to the best of my knowledge on the
statalist. Maybe he can spot the error.
-- Maarten
*---------------------- begin simulation ----------------------
capture program drop sim
program define sim, rclass
drop _all
matrix C = (1, .5, .5 \ .5, 1, .5\ .5, .5, 1)
drawnorm w x z, n(400) corr(C)
gen x2 = x^2
gen y = x + x2 + z + w + rnormal(0,.5)
replace x = . if runiform() < invlogit(y - z -3)
replace x2 = . if x == .
replace w = . if runiform() < invlogit(y + z -4)
reg y x x2 z w
return scalar cc = _b[x]
preserve
ice y x x2 z w, m(5) clear
micombine reg y x x2 z w
return scalar full = _b[x]
restore
ice y x x2 z w, m(5) clear passive(x2:x^2)
micombine reg y x x2 z w
return scalar pas = _b[x]
end
sim
exit
simulate cc=r(cc) full=r(full) pas=r(pas), reps(1000) : sim
twoway kdensity cc || kdensity full || kdensity pas , xline(1)
*--------------------- end simulation ---------------------------
-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands
visiting address:
Buitenveldertselaan 3 (Metropolitan), room N515
+31 20 5986715
http://home.fsw.vu.nl/m.buis/
-----------------------------------------
> -----------------------------------------------------------------
> Paul D. Allison
> Department of Sociology
> University of Pennsylvania
>
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/