Title | Using the cmdok option to use mi estimate with commands that are not officially supported | |
Author | Miguel Dorta, StataCorp |
The mi estimate prefix is used to analyze multiply imputed data by fitting a model to each of the imputed datasets and pooling individual results using Rubin's combination rules (Rubin 1996). It supports a number of estimation commands, including regress, mvreg, probit, and logit; see [MI] mi estimation for a full list. You can specify the cmdok option to allow mi estimate to work with community-contributed commands or commands that are not officially supported, but you must first verify that certain conditions are met.
This FAQ has been structured as follows:
In order to allow unsupported estimation commands to be prefixed by mi estimate, you can specify the cmdok option using the following syntax:
. mi estimate, cmdok: <estimation_command> ...
Here is an example using the ivprobit command (probit model with continuous endogenous regressors), which is not officially supported by mi estimate:
. use http://www.stata.com/support/faqs/dta/laborsup_imputed, clear . mi estimate, cmdok: ivprobit fem_work fem_educ kids > (other_inc = male_educ), twostep Multiple-imputation estimates Imputations = 20 Number of obs = 500 Average RVI = 0.0305 Largest FMI = 0.0883 DF adjustment: Large sample DF: min = 2,476.73 avg = 113,071.18 max = 205,905.89 Model F test: Equal FMI F( 3,35473.6) = 29.67 Within VCE type: Twostep Prob > F = 0.0000
Coefficient Std. err. t P>|t| [95% conf. interval] | ||
other_inc | -.0585911 .0094013 -6.23 0.000 -.0770174 -.0401648 | |
fem_educ | .2294515 .0285153 8.05 0.000 .1735619 .2853411 | |
kids | -.1843753 .0521693 -3.53 0.000 -.2866752 -.0820754 | |
_cons | .350795 .4980673 0.70 0.481 -.6254047 1.326995 | |
For mi estimate to apply Rubin's combination rules correctly, an unsupported estimation command must fulfill the following requirements:
For example, the pca command for principal component analysis is not currently supported by mi estimate. Let us try to prefix it with mi estimate, cmdok.
. webuse mhouses1993s30,clear (Albuquerque Home Prices Feb15-Apr30, 1993) . mi estimate, cmdok: pca price tax sqft, comp(2) matrix e(b) is not set matrix e(V) is not set r(301);
As we can see, the cmdok option did not work because the pca command does not store the e(b) and e(V) matrices, which means that requirements 2 and 3 were violated.
After an estimation command is executed, the ereturn list command can be used to see whether the required e() results above are produced. Also, the matrix list command is useful to show more details of the e(b) and e(V) matrices if they are posted.
On the other hand, if the vce(normal) option (assuming that the eigenvalues and eigenvectors are multivariate normal) is specified with the pca command, all eigenvalues and eigenvectors are stored in e(b) as a coefficient vector; the corresponding covariance matrix is stored in e(V). Let us see what happens if we now prefix pca, vce(normal) with mi estimate, cmdok.
. mi estimate, cmdok: pca price tax sqft, vce(normal) comp(2) Multiple-imputation estimates Imputations = 30 Principal components Number of obs = 117 Average RVI = 0.0092 Largest FMI = 0.0479 DF adjustment: Large sample DF: min = 12,732.29 avg = 3.47e+08 Within VCE type: MULTIVARIATE NORMALITY max = 2.62e+09
Coefficient Std. err. t P>|t| [95% conf. interval] | ||
Eigenvalues | ||
Comp1 | 2.718188 .3553968 7.65 0.000 2.021624 3.414753 | |
Comp2 | .1584574 .019509 8.12 0.000 .1202203 .1966946 | |
Comp1 | ||
price | .5776828 .0180433 32.02 0.000 .5423186 .613047 | |
tax | .5805305 .0170499 34.05 0.000 .5471133 .6139477 | |
sqft | .5738171 .0192929 29.74 0.000 .5360037 .6116305 | |
Comp2 | ||
price | -.5528039 .2298562 -2.40 0.016 -1.003357 -.1022512 | |
tax | -.2359819 .3015663 -0.78 0.434 -.8270898 .3551259 | |
sqft | .7952124 .0797496 9.97 0.000 .6388974 .9515274 | |
The cmdok option worked properly because the four requirements above were satisfied.
This example has just been used for illustration. For most estimation commands, researchers are usually interested in estimates that are returned on e(b), and mi estimate, cmdok will then compute what they need. Also, notice that the output from mi estimate will not present all the results of the pca output, because values not stored in e(b) (such as variance proportion) will be ignored in the process. Postestimation results or graphs that rely on values other than those in e(b) will not be available for this specific example. Because of issues and other considerations, some estimation commands may not be officially supported.
If the four requirements above are met, mi estimate, cmdok will correctly apply the Rubin's combination rules to multiply imputed data. However, mi estimate cannot determine whether the specific estimator has the required properties to ensure statistical validity of the final MI results. A user is responsible for checking whether the combination rules are applicable to the estimator of interest. In general, combination rules are applicable to estimators that are asymptotically normal with the corresponding variance–covariance matrix being consistently estimated. Also, combination rules should be applied to the estimators in the metric for which their sampling distributions are closest to the normal distribution. For more information about the statistical requirements of statistical validity of MI results, see Rubin (1996).
In the earlier PCA example, inference on the eigenvalues and eigenvectors mainly relies on the assumption that the variables are multivariate normally distributed. In this case, the eigenvalues and eigenvectors can be estimated using maximum likelihood with the estimates being asymptotically (multivariate) normally distributed (Anderson 1963; Jackson 2003). If the analyzed variables are not multivariate normally distributed, the MI results above would not be statistically valid.
Anderson, T. W. 1963. Asymptotic theory for principal component analysis. Annals of Mathematical Statistics 34: 122–148.
Jackson, J. E. 2003. A User’s Guide to Principal Components. New York: Wiley
Rubin, D. B. 1996. Multiple imputation after 18+ years. Journal of the American Statistical Association 91: 473–489.