What is the relation between the official multiple-imputation command,
mi, and the user-written ice and mim commands?
| Title |
|
Relation between official mi and user-written ice and mim commands |
| Authors |
Yulia Marchenko, StataCorp
Patrick Royston, MRC Clinical Trials Unit and University College London |
| Date |
October 2009 |
Multiple-imputation analysis consists of three phases: 1)
imputation—creating multiply imputed data, 2) completed data analysis
of multiply imputed data, and 3) pooling of individual analyses from phase
2 using Rubin’s combination rules (Rubin 1987, 76).
User-written commands uvis, ice (Royston 2005, 2007,
2009), and mim (Carlin, Galati, and Royston 2008; Royston, Carlin,
and White 2009) are widely used to perform multiple-imputation analysis in
Stata 9 and higher. uvis and ice perform phase 1. The
uvis command performs univariate imputation. The ice command
performs multivariate imputation via chained equations (van Buuren, Boshuizen, and Knook 1999). The mim command analyzes multiply imputed data by performing
phases 2 and 3. mim also provides some capabilities for
manipulating multiply imputed data.
On 27 July 2009, Stata 11 was released, bearing a major new feature: the
mi system for
multiple imputation and estimation of models with multiply imputed data.
The system comprises a new architecture for imputed datasets; commands for
manipulating, checking, and validating such datasets; a command, mi
impute, for doing imputation—phase 1; and a command, mi
estimate, for combining estimation results using Rubin’s
rules—phases 2 and 3. See the Multiple-Imputation
Reference Manual (StataCorp 2009) for details.
mi impute
can be used to perform univariate or multivariate imputation. It provides five
univariate imputation methods: linear regression (mi impute regress),
predictive mean matching (mi impute pmm), logistic regression (mi
impute logit), ordered logistic regression (mi impute ologit),
and multinomial logistic regression (mi impute mlogit), which are
also supported by the uvis command. uvis also
supports interval and negative binomial regression imputation methods.
Multivariate imputation can be performed using mi impute monotone
when the missingness pattern is monotone and using mi impute mvn when
the missingness pattern is arbitrary. mi impute monotone implements
a noniterative imputation method based on a sequence of independent
univariate conditional imputations (Rubin 1987, 170–186). It is
similar to the implementation of the monotone option of the
ice command. mi impute mvn performs multivariate imputation
assuming that the data have a multivariate normal distribution. It
implements the NORM method of Schafer (1997)—an iterative Markov chain
Monte Carlo method (data augmentation) based on multivariate normality. The
ice command implements an alternative iterative
multivariate-imputation method based on a sequence of univariate full
conditional specifications, also known as imputation via chained equations.
ice is not replicated in Stata’s official mi command and
is needed for performing multiple imputation by chained equations.
In Stata 11, you can use the user-written command mi ice to perform
imputation via chained equations. mi ice is available from Patrick Royston’s web page
(net from http://www.homepages.ucl.ac.uk/~ucakjpr/stata/) under the
heading mi_ice. mi ice is a wrapper for ice that understands
the official mi data format.
mi
estimate and the mi data-management routines cover most
estimation and all data-management capabilities of the mim command, as
well as offer additional features. Features of mim not supported by
mi are Monte Carlo error computation (Royston, Carlin, and White
2009) (mim, mcerror) and predictions (mim: predict).
The mi import ice and mi export ice commands make it easy to
transport data between the existing ice/mim data format and
the official mi data format.
Below we provide examples demonstrating how to switch between the mi
and ice data formats. Because ice, mi ice,
and mim are not part of official Stata, you should install them
separately. You can use the
findit
command to locate the desired package and then follow the corresponding
links for further instructions on installation.
Using mi import ice to import multiply imputed data
created by ice into mi
Using mi export ice to export mi data
to the format required by ice/mim
In our examples, we use fictional data, mheart0.dta, recording heart
attacks. The primary objective is to examine the relationship between heart
attacks and smoking adjusted for other factors such as age, body mass index,
gender, and educational status. The variable recording body mass index,
bmi, contains missing values. Thus we use multiple imputation to
analyze the heart attack data.
Using mi import ice to import multiply imputed data
created by ice into mi
If you want to transport multiply imputed data obtained previously from
ice to mi, use mi import ice. If you want to impute
missing values by using the chained-equation approach, use mi ice.
For example, suppose that you have multiply imputed data from ice and
now want to perform data manipulation or analyze it using the mi
command. We do not have such data, so we use ice to create it. We
impute missing values of the bmi variable using ice to create
five imputations and store them in a separate file, icedata.dta. We
also set the random-number seed for reproducibility.
. webuse mheart0
(Fictional heart attack data; bmi missing)
. ice bmi attack smokes age female hsgrad, saving(icedata) m(5) seed(123)
#missing |
values | Freq. Percent Cum.
------------+-----------------------------------
0 | 132 85.71 85.71
1 | 22 14.29 100.00
------------+-----------------------------------
Total | 154 100.00
Variable | Command | Prediction equation
------------+---------+-------------------------------------------------------
bmi | regress | attack smokes age female hsgrad
attack | | [No missing data in estimation sample]
smokes | | [No missing data in estimation sample]
age | | [No missing data in estimation sample]
female | | [No missing data in estimation sample]
hsgrad | | [No missing data in estimation sample]
------------------------------------------------------------------------------
Imputing
[Only 1 variable to be imputed, therefore no cycling needed]
.1.2.3.4.5
file icedata.dta saved
We now load icedata.dta, containing multiply imputed data, into
memory and use mi import ice to import data to mi. We use the
automatic option of mi import ice to identify and register
imputed variables automatically.
. use icedata, clear
(Fictional heart attack data; bmi missing)
. mi import ice, automatic
(22 m=0 obs. now marked as incomplete)
We can now use any of the mi subcommands. For example, we can check
characteristics of the imported mi data by using the mi
describe command.
. mi describe
Style: flong
last mi update 22sep2009 14:23:34, approximately 7 minutes ago
Obs.: complete 132
incomplete 22 (M = 5 imputations)
---------------------
total 154
Vars.: imputed: 1; bmi(22)
passive: 0
regular: 0
system: 3; _mi_m _mi_id _mi_miss
(there are 8 unregistered variables)
From the output above, we learn that our mi data are stored in the
flong style, and contain five imputations and one registered imputed
variable—bmi. To conserve memory, we now choose to switch to
the memory-efficient mi data storage style, mlong, by using
mi convert.
. mi convert mlong
Next we analyze our multiply imputed data to examine the relationship
between heart attacks and smoking adjusted for other factors using mi
estimate: logit.
. mi estimate: logit attack smokes bmi age female hsgrad
Multiple-imputation estimates Imputations = 5
Logistic regression Number of obs = 154
Average RVI = 0.0248
DF adjustment: Large sample DF: min = 329.80
avg = 125100.34
max = 447329.84
Model F test: Equal FMI F( 5,16477.7) = 3.44
Within VCE type: OIM Prob > F = 0.0041
------------------------------------------------------------------------------
attack | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
smokes | 1.180581 .354507 3.33 0.001 .4857551 1.875406
bmi | .0914414 .0472251 1.94 0.054 -.001459 .1843418
age | .0348427 .0153231 2.27 0.023 .0048094 .064876
female | -.1397504 .4148719 -0.34 0.736 -.9529011 .6734004
hsgrad | .148727 .4010005 0.37 0.711 -.6372217 .9346757
_cons | -5.076543 1.652779 -3.07 0.002 -8.321277 -1.83181
------------------------------------------------------------------------------
It is only necessary to use mi import ice if you already have
multiple imputations created by ice. If you need to create multiple
imputations by chained equations, use mi ice.
. webuse mheart0
(Fictional heart attack data; bmi missing)
. mi set mlong
. mi ice bmi attack smokes age female hsgrad, add(5) seed(123)
#missing |
values | Freq. Percent Cum.
------------+-----------------------------------
0 | 132 85.71 85.71
1 | 22 14.29 100.00
------------+-----------------------------------
Total | 154 100.00
Variable | Command | Prediction equation
------------+---------+-------------------------------------------------------
bmi | regress | attack smokes age female hsgrad
attack | | [No missing data in estimation sample]
smokes | | [No missing data in estimation sample]
age | | [No missing data in estimation sample]
female | | [No missing data in estimation sample]
hsgrad | | [No missing data in estimation sample]
------------------------------------------------------------------------------
Imputing
[Only 1 variable to be imputed, therefore no cycling needed]
.1.2.3.4.5
(5 imputations added; M=5)
As with the official mi command, mi ice requires declaration
of the storage style of the mi data. We again choose the
memory-efficient style mlong. mi ice stores created
multiply imputed data in the mi format automatically, so we can use
any of the mi subcommands without needing to use mi import
ice.
. mi estimate: logit attack smokes bmi age female hsgrad
Multiple-imputation estimates Imputations = 5
Logistic regression Number of obs = 154
Average RVI = 0.0248
DF adjustment: Large sample DF: min = 329.80
avg = 125100.34
max = 447329.84
Model F test: Equal FMI F( 5,16477.7) = 3.44
Within VCE type: OIM Prob > F = 0.0041
------------------------------------------------------------------------------
attack | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
smokes | 1.180581 .354507 3.33 0.001 .4857551 1.875406
bmi | .0914414 .0472251 1.94 0.054 -.001459 .1843418
age | .0348427 .0153231 2.27 0.023 .0048094 .064876
female | -.1397504 .4148719 -0.34 0.736 -.9529011 .6734004
hsgrad | .148727 .4010005 0.37 0.711 -.6372217 .9346757
_cons | -5.076543 1.652779 -3.07 0.002 -8.321277 -1.83181
------------------------------------------------------------------------------
Using mi export ice to export mi data to the format
required by ice/mim
Suppose that we impute missing values of the bmi variable with mi
impute and want to check the Monte Carlo variability of imputations by using
mim, mcerror. We impute continuous variable bmi with a
univariate regression imputation method, mi impute regress. Prior to
using the mi command, we must declare our data to be mi data.
We use mi set to declare the style; we choose the mlong style.
We also use mi register to register bmi as an imputed variable,
as required by mi impute.
. webuse mheart0
(Fictional heart attack data; bmi missing)
. mi set mlong
. mi register imputed bmi
(22 m=0 obs. now marked as incomplete)
. mi impute regress bmi attack smokes age female hsgrad, add(5) rseed(123)
Univariate imputation Imputations = 5
Linear regression added = 5
Imputed: m=1 through m=5 updated = 0
| Observations per m
|----------------------------------------------
Variable | complete incomplete imputed | total
---------------+-----------------------------------+----------
bmi | 132 22 22 | 154
--------------------------------------------------------------
(complete + incomplete = total; imputed is the minimum across m
of the number of filled in observations.)
Next we use mi export ice to convert mi data to the format supported by mim.
. mi export ice
We now obtain multiple-imputation estimates using mim, and we
display Monte Carlo errors on replay.
. qui mim: logit attack smokes bmi age female hsgrad
. mim, mcerror
Multiple-imputation estimates (logit) Imputations = 5
Logistic regression Minimum obs = 154
Minimum dof = 71.5
[Values displayed beneath estimates are Monte Carlo jackknife standard errors]
------------------------------------------------------------------------------
attack | Coef. Std. Err. t P>|t| [95% Conf. Int.] FMI
-------------+----------------------------------------------------------------
smokes | 1.19365 .357948 3.33 0.001 .491173 1.89613 0.015
| .018074 .004775 0.01 5.9e-05 .009022 .027642 0.015
|
bmi | .098509 .051642 1.91 0.060 -.00445 .201468 0.244
| .010008 .003559 0.15 .0218 .008835 .017418 0.128
|
age | .036008 .015521 2.32 0.021 .005547 .066469 0.020
| .000902 .000192 0.04 .0023 .000657 .001229 0.015
|
female | -.113328 .416562 -0.27 0.786 -.930824 .704168 0.013
| .019682 .002377 0.05 .037 .016803 .023227 0.006
|
hsgrad | .15552 .403454 0.39 0.700 -.636229 .947269 0.010
| .016554 .002459 0.04 .0288 .012437 .021043 0.003
|
_cons | -5.32991 1.8006 -2.96 0.004 -8.89848 -1.76133 0.191
| .310139 .119922 0.13 .0028 .563829 .216917 0.111
------------------------------------------------------------------------------
References
- Carlin, J. B., J. C. Galati, and P. Royston. 2008.
- A new framework for managing and analyzing multiply imputed data in Stata.
Stata Journal 8: 49–67.
- Royston, P. 2005.
- Multiple imputation of missing values: Update of ice. Stata Journal
5: 527–536.
- Royston, P. 2007.
- Multiple imputation of missing values: Further update of ice, with an emphasis on interval censoring. Stata Journal 7: 445–464.
- Royston, P. 2009.
- Multiple imputation of missing values: Further update of ice, with an
emphasis on categorical variables. Stata Journal 9: 466–477.
- Royston, P., J. B. Carlin, and I. R. White. 2009.
- Multiple imputation of missing values: New features for mim.
Stata Journal 9: 252–264.
- Rubin, D. B. 1987.
- Multiple Imputation for Nonresponse in Surveys. New York: Wiley.
- Schafer, J. L. 1997.
- Analysis of Incomplete Multivariate Data. Boca Raton, FL:
Chapman & Hall/CRC.
- StataCorp. 2009.
- Stata 11 Multiple-Imputation Reference Manual. College Station, TX: Stata Press.
- van Buuren, S., H. C. Boshuizen, and D. L. Knook. 1999.
- Multiple imputation of missing blood pressure covariates in survival
analysis. Statistics in Medicine 18: 681–694.
|