Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Multiplying variables / Generating new variables after 'mi impute'
From
Alan Acock <[email protected]>
To
[email protected]
Subject
Re: st: Multiplying variables / Generating new variables after 'mi impute'
Date
Wed, 28 Apr 2010 15:53:06 -0700
One problem with passive imputation of xy is that you will be adding information that is not consistent with observed data
Alan Acock
[email protected] wrote:
>Moses Lee <[email protected]> asks how to create a product of two imputed
>variables after -mi impute-:
>
>> I need to perform production function regression. This requires 2 stage.
>>
>> 1) Impute missing variables
>> 2) Generate new variables - by multiplying two existing variables after
>> imputing missing variables.
>>
>> I have tried running 'mi impute mvn' command. However, I realised the
>> imputed values do not replace the missing values in the original variables.
>> This becomes a problem if I need to estimate new variables which require a
>> product of the original variables.
>>
>> An example: X and Y are variables that require imputing. After running the
>> mi impute command, I need to 'gen (newvar)=X*Y. However, if the missing
>> values in the existing X and Y are not replaced with the imputed values, I'm
>> unable to generate a new variable.
>>
>> Can someone advise on how to replace missing values with imputed values? It
>> seems impossible to generate new vars with the estimated imputed values.
>
>Moses's post raises two issues. One is the mechanical issue of creating
>passive variables -- variables derived from the imputed variables. Second is
>the statistical issue of how to handle passive variables during imputation.
>
>
>1. Mechanical issues
>---------------------
>
>Moses mentioned the use of the -generate- command to create a product of two
>variables, and that did not work for him.
>
>To create passive variables based on the imputed variables, use the
>-mi passive- command. It will work.
>
>As Maarten Buis mentioned in
>http://www.stata.com/statalist/archive/2010-04/msg01603.html, -mi- provides
>lots of "styles" in which multiply imputed data might be stored, and I don't
>know which Moses is using. In some cases, Moses could use -generate-,
>although he would need to follow that up with -mi register passive-.
>Regardless of all that, -mi passive- can be used with all styles, and
>it always works the same way.
>
>Let me note that it is important to use -mi- specific commands in place of the
>standard Stata commands when there is an -mi- specific alternative. The list
>of the -mi- specific commands can be found in -help mi-. If there is no
>-mi- specific version, before using the standard Stata construct, look
>first at -mi xeq:-.
>
>Anyway, there's no substitute for reading the manual. One thing you will
>learn is to always use -mi passive- when working with passive variables.
>
>
>2. Statistical issues
>----------------------
>
>Steve Samuels replied that instead of creating a product variable after
>imputing the constituent variables, Moses should impute the product variable
>directly (http://www.stata.com/statalist/archive/2010-04/msg01604.html). More
>generally, Moses does need to ensure that the imputation model used captures
>the structure of the analysis model of interest. If an interaction between
>two variables is included in the analysis model, this interaction should also
>be present or accounted for in the imputation model.
>
>Two approaches for handling passive variables during imputation are considered
>in the literature. I will refer to them as joint modeling (JM) and passive
>imputation (PI).
>
>Per JM, a passive variable is treated simply as another imputation variable
>and standard imputation techniques are applied to it. For example, if Y and X
>are being imputed using the multivariate normal model (MVN), then their
>product Y*X is simply included as another variable in the model specification.
>In Stata this would correspond to:
>
> . gen yx = y*x
> . mi set wide
> . mi register imputed y x yx
> . mi impute mvn y x yx ...
>
>One drawback of JM is that it does not take into account the functional
>relationship of yx with respect to other variables in the model. Also, the
>assumption of joint normality in the presence of nonlinearities, such as the
>product, is suspect. However, despite these drawbacks, this method is
>currently being used in practice.
>
>The PI method takes the functional relationship into account by including the
>product term yx in the model as a product of imputed y and imputed x. The PI
>approach is available within the sequential imputation as implemented by the
>user-written command -ice-; type -findit ice- to locate the command (in Stata
>11, type -findit mi_ice- to locate the -mi--aware wrapper for -ice-). The
>passive imputation would correspond to, I believe, the following syntaxes of
>-ice- and -mi ice-:
>
> . ice y x yx ..., m(20) passive(yx:y*x) ...
> . mi ice y x yx ..., add(20) passive(yx:y*x) ...
>
>Currently, there is no definite recommendation to which method should be used
>in practice, although, Patrick Royston and his colleagues have been
>investigating the performance of the two approaches and may have more insight
>regarding these issues.
>
>
>-- Yulia
>[email protected]
>*
>* For searches and help try:
>* http://www.stata.com/help.cgi?search
>* http://www.stata.com/support/statalist/faq
>* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/