JIBONAYAN RAYCHAUDHURI <[email protected]> asks if Stata 11's -mi- command
provides imputation methods for panel data:
> Can mi in Stata 11.0 perform imputation over panel data (which has been
> tsset)? Does the data need to be arranged in wide form (from long form)
> before mi can be applied to the data set?
-mi- does not provide imputation methods specifically designed to impute
complex data, such as panel, longitudinal data, complex survey data,
time-series data, etc. The methods employed by -mi- rely on the iid
assumption which is violated in these data, and to the best of my knowledge
the methodologies for imputation methods relaxing this assumption have yet to
be fully developed.
In some cases, there are ways of using existing iid imputation methods to
impute complex data. For example, longitudinal data can be reshaped to wide
form (one variable for each time period) and then the MVN model can be used
for imputation. In Stata, you can use -mi impute mvn- to do that. Say we
have subjects' weights measured at three time periods: y1, y2, y3. We can
type
. mi register imputed y1 y2 y3 // register variables to be imputed
. mi impute mvn y1 y2 y3, add(10) // create 10 imputations
If you now want to reshape your data to long form after imputation, you can
use -mi reshape-:
. mi reshape y, i(id) j(time)
where variable 'id' contains observation identifiers and new variable 'time'
will contain the time periods after the data are reshaped. Note that you
should use -mi reshape- rather than -reshape- to reshape _mi_ data.
Similarly, if your data are in long form, you can use -mi reshape- to reshape
it to wide form prior to using -mi impute-.
In the presence of clustering, stratification, missing data can be imputed
conditionally on the design variables, provided there are not too many
clusters or strata. For example, if continuous variables x1 and x2 contain
missing values and data are stratified on race, you can account for
stratification by including variable 'race' as a factor variable in the
imputation model:
. mi register imputed x1 x2
. mi impute mvn x1 x2 = i.race, add(5)
In the above, I could have also used -mi impute monotone-, instead, if I knew
that the pattern of missing data is monotone.
See Rubin (1987), Schafer (1997, 29-35, 372-377), for example, for more
information about imputing complex data.
Jibonayan mentions the use of -tsset- which implies that the data are also
time-series data. I'm not aware of imputation methods applicable to filling
in time-series data.
In reply to Jibonayan's question, Martin Weiss <[email protected]> points
out:
> There is an -mi tsset- command as seen in which makes me think there is
> support for imputation for panel data...
Although -mi- does not provide direct methods for imputing panel data,
time-series data, etc., it provides ways of 'mi setting' such data in case
users already have imputations for it from other sources and need to perform
data manipulation.
Jibonayan also asks if a user-written command -levpet- can be used with -mi-:
> Is it possible to combine mi with the levpet method of generating TFP?,i.e.,
> can levpet be applied over imputed data sets and the overall TFP measures,
> thus generated, combined?
Technically, you can use -mi estimate- with -levpet- (or with any other
estimation command outside the list of supported commands in -help mi
estimation-) to obtain combined estimates of the coefficients if you specify
-mi estimate-'s option -cmdok-:
. mi estimate, cmdok: levpet ...
Statistically, it is your responsibility to verify that multiple imputation
(MI) is applicable for the estimation method used. In general, as long as
approximate (asymptotic) normality holds for an estimator and the variance of
the estimator is a consistent estimate of the true variability in the complete
data, it should be ok to apply MI combination rules to this estimator.
Now, what Jibonayan really wants are the combined estimates of the predictions
after using -mi estimate- with -levpet- which -mi- does not provide. There is
no definitive recommendation on how predictions must be handled within the MI
framework. Jibonayan may want to check out user-written command -mim- (and
-mim: predict-, in particular) for a way of obtaining predicted values with
multiply-imputed data.
In any case, Jibonayan should first decide on what would be an appropriate way
of imputing the time-series data before performing the analysis.
References:
Rubin, D. B. 1987. Multiple Imputation for Nonresponse in Surveys. New York:
Wiley.
Schafer, J. L. 1997. Analysis of Incomplete Multivariate Data. Boca Raton,
FL: Chapman & Hall/CRC.
-- Yulia
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/