Stata
Products Purchase Support Company
Search
   >> Home >> Resources & support >> FAQs >> Relation between official mi and user-written ice and mim commands Bookmark and Share

What is the relation between the official multiple-imputation command, mi, and the user-written ice and mim commands?

Title   Relation between official mi and user-written ice and mim commands
Authors Yulia Marchenko, StataCorp
Patrick Royston, MRC Clinical Trials Unit and University College London
Date October 2009

Multiple-imputation analysis consists of three phases: 1) imputation—creating multiply imputed data, 2) completed data analysis of multiply imputed data, and 3) pooling of individual analyses from phase 2 using Rubin’s combination rules (Rubin 1987, 76).

User-written commands uvis, ice (Royston 2005, 2007, 2009), and mim (Carlin, Galati, and Royston 2008; Royston, Carlin, and White 2009) are widely used to perform multiple-imputation analysis in Stata 9 and higher. uvis and ice perform phase 1. The uvis command performs univariate imputation. The ice command performs multivariate imputation via chained equations (van Buuren, Boshuizen, and Knook 1999). The mim command analyzes multiply imputed data by performing phases 2 and 3. mim also provides some capabilities for manipulating multiply imputed data.

On 27 July 2009, Stata 11 was released, bearing a major new feature: the mi system for multiple imputation and estimation of models with multiply imputed data. The system comprises a new architecture for imputed datasets; commands for manipulating, checking, and validating such datasets; a command, mi impute, for doing imputation—phase 1; and a command, mi estimate, for combining estimation results using Rubin’s rules—phases 2 and 3. See the Multiple-Imputation Reference Manual (StataCorp 2009) for details.

mi impute can be used to perform univariate or multivariate imputation. It provides five univariate imputation methods: linear regression (mi impute regress), predictive mean matching (mi impute pmm), logistic regression (mi impute logit), ordered logistic regression (mi impute ologit), and multinomial logistic regression (mi impute mlogit), which are also supported by the uvis command. uvis also supports interval and negative binomial regression imputation methods.

Multivariate imputation can be performed using mi impute monotone when the missingness pattern is monotone and using mi impute mvn when the missingness pattern is arbitrary. mi impute monotone implements a noniterative imputation method based on a sequence of independent univariate conditional imputations (Rubin 1987, 170–186). It is similar to the implementation of the monotone option of the ice command. mi impute mvn performs multivariate imputation assuming that the data have a multivariate normal distribution. It implements the NORM method of Schafer (1997)—an iterative Markov chain Monte Carlo method (data augmentation) based on multivariate normality. The ice command implements an alternative iterative multivariate-imputation method based on a sequence of univariate full conditional specifications, also known as imputation via chained equations. ice is not replicated in Stata’s official mi command and is needed for performing multiple imputation by chained equations.

In Stata 11, you can use the user-written command mi ice to perform imputation via chained equations. mi ice is available from Patrick Royston’s web page (net from http://www.homepages.ucl.ac.uk/~ucakjpr/stata/) under the heading mi_ice. mi ice is a wrapper for ice that understands the official mi data format.

mi estimate and the mi data-management routines cover most estimation and all data-management capabilities of the mim command, as well as offer additional features. Features of mim not supported by mi are Monte Carlo error computation (Royston, Carlin, and White 2009) (mim, mcerror) and predictions (mim: predict).

The mi import ice and mi export ice commands make it easy to transport data between the existing ice/mim data format and the official mi data format.

Below we provide examples demonstrating how to switch between the mi and ice data formats. Because ice, mi ice, and mim are not part of official Stata, you should install them separately. You can use the findit command to locate the desired package and then follow the corresponding links for further instructions on installation.

Using mi import ice to import multiply imputed data created by ice into mi
Using mi export ice to export mi data to the format required by ice/mim

In our examples, we use fictional data, mheart0.dta, recording heart attacks. The primary objective is to examine the relationship between heart attacks and smoking adjusted for other factors such as age, body mass index, gender, and educational status. The variable recording body mass index, bmi, contains missing values. Thus we use multiple imputation to analyze the heart attack data.


Using mi import ice to import multiply imputed data created by ice into mi

If you want to transport multiply imputed data obtained previously from ice to mi, use mi import ice. If you want to impute missing values by using the chained-equation approach, use mi ice.

For example, suppose that you have multiply imputed data from ice and now want to perform data manipulation or analyze it using the mi command. We do not have such data, so we use ice to create it. We impute missing values of the bmi variable using ice to create five imputations and store them in a separate file, icedata.dta. We also set the random-number seed for reproducibility.

. webuse mheart0
(Fictional heart attack data; bmi missing)

. ice bmi attack smokes age female hsgrad, saving(icedata) m(5) seed(123)


   #missing |
     values |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        132       85.71       85.71
          1 |         22       14.29      100.00
------------+-----------------------------------
      Total |        154      100.00

   Variable | Command | Prediction equation
------------+---------+-------------------------------------------------------
        bmi | regress | attack smokes age female hsgrad
     attack |         | [No missing data in estimation sample]
     smokes |         | [No missing data in estimation sample]
        age |         | [No missing data in estimation sample]
     female |         | [No missing data in estimation sample]
     hsgrad |         | [No missing data in estimation sample]
------------------------------------------------------------------------------

Imputing 
[Only 1 variable to be imputed, therefore no cycling needed]
.1.2.3.4.5
file icedata.dta saved

We now load icedata.dta, containing multiply imputed data, into memory and use mi import ice to import data to mi. We use the automatic option of mi import ice to identify and register imputed variables automatically.

. use icedata, clear
(Fictional heart attack data; bmi missing)

. mi import ice, automatic
(22 m=0 obs. now marked as incomplete)

We can now use any of the mi subcommands. For example, we can check characteristics of the imported mi data by using the mi describe command.

. mi describe

  Style:  flong
          last mi update 22sep2009 14:23:34, approximately 7 minutes ago

  Obs.:   complete          132
          incomplete         22  (M = 5 imputations)
          ---------------------
          total             154

  Vars.:  imputed:  1; bmi(22)

          passive: 0

          regular: 0

          system:  3; _mi_m _mi_id _mi_miss

         (there are 8 unregistered variables)

From the output above, we learn that our mi data are stored in the flong style, and contain five imputations and one registered imputed variable—bmi. To conserve memory, we now choose to switch to the memory-efficient mi data storage style, mlong, by using mi convert.

. mi convert mlong

Next we analyze our multiply imputed data to examine the relationship between heart attacks and smoking adjusted for other factors using mi estimate: logit.

. mi estimate: logit attack smokes bmi age female hsgrad

Multiple-imputation estimates                     Imputations     =          5
Logistic regression                               Number of obs   =        154
                                                  Average RVI     =     0.0248
DF adjustment:   Large sample                     DF:     min     =     329.80
                                                          avg     =  125100.34
                                                          max     =  447329.84
Model F test:       Equal FMI                     F(   5,16477.7) =       3.44
Within VCE type:          OIM                     Prob > F        =     0.0041

------------------------------------------------------------------------------
      attack |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      smokes |   1.180581    .354507     3.33   0.001     .4857551    1.875406
         bmi |   .0914414   .0472251     1.94   0.054     -.001459    .1843418
         age |   .0348427   .0153231     2.27   0.023     .0048094     .064876
      female |  -.1397504   .4148719    -0.34   0.736    -.9529011    .6734004
      hsgrad |    .148727   .4010005     0.37   0.711    -.6372217    .9346757
       _cons |  -5.076543   1.652779    -3.07   0.002    -8.321277    -1.83181
------------------------------------------------------------------------------

It is only necessary to use mi import ice if you already have multiple imputations created by ice. If you need to create multiple imputations by chained equations, use mi ice.

. webuse mheart0
(Fictional heart attack data; bmi missing)

. mi set mlong

. mi ice bmi attack smokes age female hsgrad, add(5) seed(123)

   #missing |
     values |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        132       85.71       85.71
          1 |         22       14.29      100.00
------------+-----------------------------------
      Total |        154      100.00

   Variable | Command | Prediction equation
------------+---------+-------------------------------------------------------
        bmi | regress | attack smokes age female hsgrad
     attack |         | [No missing data in estimation sample]
     smokes |         | [No missing data in estimation sample]
        age |         | [No missing data in estimation sample]
     female |         | [No missing data in estimation sample]
     hsgrad |         | [No missing data in estimation sample]
------------------------------------------------------------------------------

Imputing 
[Only 1 variable to be imputed, therefore no cycling needed]
.1.2.3.4.5
(5 imputations added; M=5)

As with the official mi command, mi ice requires declaration of the storage style of the mi data. We again choose the memory-efficient style mlong. mi ice stores created multiply imputed data in the mi format automatically, so we can use any of the mi subcommands without needing to use mi import ice.

. mi estimate: logit attack smokes bmi age female hsgrad

Multiple-imputation estimates                     Imputations     =          5
Logistic regression                               Number of obs   =        154
                                                  Average RVI     =     0.0248
DF adjustment:   Large sample                     DF:     min     =     329.80
                                                          avg     =  125100.34
                                                          max     =  447329.84
Model F test:       Equal FMI                     F(   5,16477.7) =       3.44
Within VCE type:          OIM                     Prob > F        =     0.0041

------------------------------------------------------------------------------
      attack |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      smokes |   1.180581    .354507     3.33   0.001     .4857551    1.875406
         bmi |   .0914414   .0472251     1.94   0.054     -.001459    .1843418
         age |   .0348427   .0153231     2.27   0.023     .0048094     .064876
      female |  -.1397504   .4148719    -0.34   0.736    -.9529011    .6734004
      hsgrad |    .148727   .4010005     0.37   0.711    -.6372217    .9346757
       _cons |  -5.076543   1.652779    -3.07   0.002    -8.321277    -1.83181
------------------------------------------------------------------------------

Using mi export ice to export mi data to the format required by ice/mim

Suppose that we impute missing values of the bmi variable with mi impute and want to check the Monte Carlo variability of imputations by using mim, mcerror. We impute continuous variable bmi with a univariate regression imputation method, mi impute regress. Prior to using the mi command, we must declare our data to be mi data. We use mi set to declare the style; we choose the mlong style. We also use mi register to register bmi as an imputed variable, as required by mi impute.

. webuse mheart0
(Fictional heart attack data; bmi missing)

. mi set mlong

. mi register imputed bmi
(22 m=0 obs. now marked as incomplete)

. mi impute regress bmi attack smokes age female hsgrad, add(5) rseed(123)

Univariate imputation                   Imputations =        5
Linear regression                             added =        5
Imputed: m=1 through m=5                    updated =        0

               |              Observations per m              
               |----------------------------------------------
      Variable |   complete   incomplete   imputed |     total
---------------+-----------------------------------+----------
           bmi |        132           22        22 |       154
--------------------------------------------------------------
(complete + incomplete = total; imputed is the minimum across m
 of the number of filled in observations.)

Next we use mi export ice to convert mi data to the format supported by mim.

. mi export ice

We now obtain multiple-imputation estimates using mim, and we display Monte Carlo errors on replay.

. qui mim: logit attack smokes bmi age female hsgrad

. mim, mcerror

Multiple-imputation estimates (logit)                    Imputations =       5
Logistic regression                                      Minimum obs =     154
                                                         Minimum dof =    71.5

[Values displayed beneath estimates are Monte Carlo jackknife standard errors]
------------------------------------------------------------------------------
      attack |     Coef.  Std. Err.     t    P>|t|    [95% Conf. Int.]     FMI
-------------+----------------------------------------------------------------
      smokes |   1.19365   .357948    3.33   0.001    .491173  1.89613   0.015
             |   .018074   .004775    0.01   5.9e-05  .009022  .027642   0.015
             |
         bmi |   .098509   .051642    1.91   0.060    -.00445  .201468   0.244
             |   .010008   .003559    0.15   .0218    .008835  .017418   0.128
             |
         age |   .036008   .015521    2.32   0.021    .005547  .066469   0.020
             |   .000902   .000192    0.04   .0023    .000657  .001229   0.015
             |
      female |  -.113328   .416562   -0.27   0.786   -.930824  .704168   0.013
             |   .019682   .002377    0.05    .037    .016803  .023227   0.006
             |
      hsgrad |    .15552   .403454    0.39   0.700   -.636229  .947269   0.010
             |   .016554   .002459    0.04   .0288    .012437  .021043   0.003
             |
       _cons |  -5.32991    1.8006   -2.96   0.004   -8.89848 -1.76133   0.191
             |   .310139   .119922    0.13   .0028    .563829  .216917   0.111
------------------------------------------------------------------------------

References

Carlin, J. B., J. C. Galati, and P. Royston. 2008.
A new framework for managing and analyzing multiply imputed data in Stata. Stata Journal 8: 49–67.
Royston, P. 2005.
Multiple imputation of missing values: Update of ice. Stata Journal 5: 527–536.
Royston, P. 2007.
Multiple imputation of missing values: Further update of ice, with an emphasis on interval censoring. Stata Journal 7: 445–464.
Royston, P. 2009.
Multiple imputation of missing values: Further update of ice, with an emphasis on categorical variables. Stata Journal 9: 466–477.
Royston, P., J. B. Carlin, and I. R. White. 2009.
Multiple imputation of missing values: New features for mim. Stata Journal 9: 252–264.
Rubin, D. B. 1987.
Multiple Imputation for Nonresponse in Surveys. New York: Wiley.
Schafer, J. L. 1997.
Analysis of Incomplete Multivariate Data. Boca Raton, FL: Chapman & Hall/CRC.
StataCorp. 2009.
Stata 11 Multiple-Imputation Reference Manual. College Station, TX: Stata Press.
van Buuren, S., H. C. Boshuizen, and D. L. Knook. 1999.
Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in Medicine 18: 681–694.
FAQs
What's new?
Statistics
Data management
Graphics
Programming Stata
Mata
Resources
Internet capabilities
Stata for Windows
Stata for Unix
Stata for Mac
Technical support
Resources & support
FAQs
Technical support
NetCourses
Short courses
Users Group meetings
Statalist
Links
Software updates
Software archives
Customer service
Manuals & supplements
Stata Journal
STB
Stata News
Stata Automation
Plugins

Site overview
Products
Resources & support
Company
Site index

© Copyright 1996–2009 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index