Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Multiplying variables / Generating new variables after 'mi impute'
From
[email protected] (Yulia Marchenko, StataCorp LP)
To
[email protected]
Subject
Re: st: Multiplying variables / Generating new variables after 'mi impute'
Date
Wed, 28 Apr 2010 15:09:25 -0500
Moses Lee <[email protected]> asks how to create a product of two imputed
variables after -mi impute-:
> I need to perform production function regression. This requires 2 stage.
>
> 1) Impute missing variables
> 2) Generate new variables - by multiplying two existing variables after
> imputing missing variables.
>
> I have tried running 'mi impute mvn' command. However, I realised the
> imputed values do not replace the missing values in the original variables.
> This becomes a problem if I need to estimate new variables which require a
> product of the original variables.
>
> An example: X and Y are variables that require imputing. After running the
> mi impute command, I need to 'gen (newvar)=X*Y. However, if the missing
> values in the existing X and Y are not replaced with the imputed values, I'm
> unable to generate a new variable.
>
> Can someone advise on how to replace missing values with imputed values? It
> seems impossible to generate new vars with the estimated imputed values.
Moses's post raises two issues. One is the mechanical issue of creating
passive variables -- variables derived from the imputed variables. Second is
the statistical issue of how to handle passive variables during imputation.
1. Mechanical issues
---------------------
Moses mentioned the use of the -generate- command to create a product of two
variables, and that did not work for him.
To create passive variables based on the imputed variables, use the
-mi passive- command. It will work.
As Maarten Buis mentioned in
http://www.stata.com/statalist/archive/2010-04/msg01603.html, -mi- provides
lots of "styles" in which multiply imputed data might be stored, and I don't
know which Moses is using. In some cases, Moses could use -generate-,
although he would need to follow that up with -mi register passive-.
Regardless of all that, -mi passive- can be used with all styles, and
it always works the same way.
Let me note that it is important to use -mi- specific commands in place of the
standard Stata commands when there is an -mi- specific alternative. The list
of the -mi- specific commands can be found in -help mi-. If there is no
-mi- specific version, before using the standard Stata construct, look
first at -mi xeq:-.
Anyway, there's no substitute for reading the manual. One thing you will
learn is to always use -mi passive- when working with passive variables.
2. Statistical issues
----------------------
Steve Samuels replied that instead of creating a product variable after
imputing the constituent variables, Moses should impute the product variable
directly (http://www.stata.com/statalist/archive/2010-04/msg01604.html). More
generally, Moses does need to ensure that the imputation model used captures
the structure of the analysis model of interest. If an interaction between
two variables is included in the analysis model, this interaction should also
be present or accounted for in the imputation model.
Two approaches for handling passive variables during imputation are considered
in the literature. I will refer to them as joint modeling (JM) and passive
imputation (PI).
Per JM, a passive variable is treated simply as another imputation variable
and standard imputation techniques are applied to it. For example, if Y and X
are being imputed using the multivariate normal model (MVN), then their
product Y*X is simply included as another variable in the model specification.
In Stata this would correspond to:
. gen yx = y*x
. mi set wide
. mi register imputed y x yx
. mi impute mvn y x yx ...
One drawback of JM is that it does not take into account the functional
relationship of yx with respect to other variables in the model. Also, the
assumption of joint normality in the presence of nonlinearities, such as the
product, is suspect. However, despite these drawbacks, this method is
currently being used in practice.
The PI method takes the functional relationship into account by including the
product term yx in the model as a product of imputed y and imputed x. The PI
approach is available within the sequential imputation as implemented by the
user-written command -ice-; type -findit ice- to locate the command (in Stata
11, type -findit mi_ice- to locate the -mi--aware wrapper for -ice-). The
passive imputation would correspond to, I believe, the following syntaxes of
-ice- and -mi ice-:
. ice y x yx ..., m(20) passive(yx:y*x) ...
. mi ice y x yx ..., add(20) passive(yx:y*x) ...
Currently, there is no definite recommendation to which method should be used
in practice, although, Patrick Royston and his colleagues have been
investigating the performance of the two approaches and may have more insight
regarding these issues.
-- Yulia
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/