Title | Relation between official mi and community-contributed ice and mim commands | |
Authors | Yulia Marchenko, StataCorp Patrick Royston, MRC Clinical Trials Unit and University College London |
Note: Because the ice community-contributed command is based upon random draws, results may differ on previous versions as a consequence of the 64-bit Mersenne Twister pseudorandom numbers, which was added to Stata since version 14.
Multiple-imputation analysis consists of three phases: 1) imputation—creating multiply imputed data, 2) completed data analysis of multiply imputed data, and 3) pooling of individual analyses from phase 2 using Rubin’s combination rules (Rubin 1987, 76).
Community-contributed commands uvis, ice (Royston 2005, 2007, 2009), and mim (Carlin, Galati, and Royston 2008; Royston, Carlin, and White 2009) are widely used to perform multiple-imputation analysis in Stata 9 and higher. uvis and ice perform phase 1. The uvis command performs univariate imputation. The ice command performs multivariate imputation via chained equations (van Buuren, Boshuizen, and Knook 1999). The mim command analyzes multiply imputed data by performing phases 2 and 3. mim also provides some capabilities for manipulating multiply imputed data.
On 27 July 2009, Stata 11 was released, bearing a major feature: the mi system for multiple imputation and estimation of models with multiply imputed data. The system comprises a new architecture for imputed datasets; commands for manipulating, checking, and validating such datasets; a command, mi impute, for doing imputation—phase 1; and a command, mi estimate, for combining estimation results using Rubin’s rules—phases 2 and 3. See the Multiple-Imputation Reference Manual (StataCorp 2023) for details. mi impute and mi estimate were expanded in Stata 12.
mi impute performs both univariate and multivariate imputation. There are nine univariate methods and three multivariate ones; please see the mi impute manual entry for a list. The nine univariate methods include two not available in uvis: Poisson and truncated normal imputation.
Multivariate imputation can be performed using mi impute monotone when the missingness pattern is monotone and using mi impute mvn or mi impute chained when the pattern is not monotone. mi impute monotone implements a noniterative imputation method based on a sequence of independent univariate conditional imputations (Rubin 1987, 170–186). It is similar to the implementation of the monotone option of the ice command. mi impute mvn performs multivariate imputation assuming that the data have a multivariate normal distribution. It implements the NORM method of Schafer (1997)—an iterative Markov chain Monte Carlo method (data augmentation) based on multivariate normality. The mi impute chained command implements an alternative iterative multivariate-imputation method based on a sequence of univariate full conditional specifications, also known as imputation via chained equations. mi impute chained was added in Stata 12 and uses the same method as implemented in the ice command.
mi impute chained and ice use the same imputation method, but their features are not the same. mi impute chained supports factor variables. ice includes stepwise model selection and is compatible with all releases since Stata 9. And if you have Stata 11 or more recent, you can use mi ice, a wrapper command for ice that understands the official mi data format. (mi ice is available from Patrick Royston’s web page under the heading mi_ice; in Stata, type net from http://www.homepages.ucl.ac.uk/~ucakjpr/stata.)
The official mi commands since Stata 12 cover all data-management and most estimation capabilities of mim; one exception is mim’s category(combine) option for combining arbitrary scalars. (See stata.com/support/faqs/statistics/combine-results-with-multiply-imputed-data for information on combining arbitrary scalars using mi estimate.) If you wish to use mim and have Stata 11 or more recent, you can use mim2, which understands the official mi data format. mim2 is available from the same website as mi ice.
The mi import ice and mi export ice commands make it easy to transport data between the existing ice/mim data format and the official mi data format.
Below we provide examples demonstrating how to switch between the mi and ice data formats. Because ice, mi ice, and mim are not part of official Stata, you should install them separately. You can use the search command to locate the desired package, and then follow the corresponding links for further instructions on installation.
Using mi import ice to import multiply imputed data
created by ice into mi
In our examples, we use fictional data, mheart0.dta, recording heart attacks. The primary objective is to examine the relationship between heart attacks and smoking adjusted for other factors such as age, body mass index, gender, and educational status. The variable recording body mass index, bmi, contains missing values. Thus we use multiple imputation to analyze the heart attack data.
If you want to transport multiply imputed data obtained previously from ice to mi, use mi import ice.
For example, suppose you have multiply imputed data from ice and now want to perform data manipulation or analyze it using the mi command. We do not have such data, so we use ice to create it. We impute missing values of the bmi variable using ice to create five imputations and store them in a separate file, icedata.dta. We also set the random-number seed for reproducibility.
(Note: To run this example, you will need to install the community-contributed command, ice. You can obtain this command by typing ssc install ice in Stata.)
. webuse mheart0 (Fictional heart attack data; BMI missing) . ice bmi attack smokes age female hsgrad, saving(icedata) m(5) seed(123)
#missing | ||
values | Freq. Percent Cum. | |
0 | 132 85.71 85.71 | |
1 | 22 14.29 100.00 | |
Total | 154 100.00 |
Variable | Command | Prediction equation | ||
attack | [No missing data in estimation sample] | |||
smokes | [No missing data in estimation sample] | |||
age | [No missing data in estimation sample] | |||
female | [No missing data in estimation sample] | |||
hsgrad | [No missing data in estimation sample] | |||
bmi | regress | attack smokes age female hsgrad | ||
We now load icedata.dta, containing multiply imputed data, into memory and use mi import ice to import data to mi. We use the automatic option of mi import ice to identify and register imputed variables automatically.
. use icedata, clear (Fictional heart attack data; BMI missing) . mi import ice, automatic (22 m=0 obs now marked as incomplete)
We can now use any of the mi subcommands. For example, we can check characteristics of the imported mi data by using the mi describe command.
. mi describe Style: flong last mi update 27may2021 13:16:36, approximately 1 minute ago
Observations: | |
Complete 132 | |
Incomplete 22 | (M = 5 imputations) |
Total 154 |
From the output above, we learn that our mi data are stored in the flong style and contain five imputations and one registered imputed variable—bmi. To conserve memory, we now choose to switch to the memory-efficient mi data storage style, mlong, by using mi convert.
. mi convert mlong
Next we analyze our multiply imputed data to examine the relationship between heart attacks and smoking adjusted for other factors using mi estimate: logit.
. mi estimate: logit attack smokes bmi age female hsgrad Multiple-imputation estimates Imputations = 5 Logistic regression Number of obs = 154 Average RVI = 0.0298 Largest FMI = 0.1046 DF adjustment: Large sample DF: min = 398.79 avg = 18,342.11 max = 48,184.53 Model F test: Equal FMI F( 5,13096.8) = 3.76 Within VCE type: OIM Prob > F = 0.0021
attack | Coefficient Std. err. t P>|t| [95% conf. interval] | |
smokes | 1.258327 .3629043 3.47 0.001 .5470296 1.969624 | |
bmi | .1143867 .0468168 2.44 0.015 .0223481 .2064253 | |
age | .0358312 .0155562 2.30 0.021 .0053394 .066323 | |
female | -.0827343 .4231973 -0.20 0.845 -.9123378 .7468693 | |
hsgrad | .1919585 .4065251 0.47 0.637 -.604842 .9887591 | |
_cons | -5.780003 1.683405 -3.43 0.001 -9.084615 -2.47539 | |
It is only necessary to use mi import ice if you already have multiple imputations created by ice.