FAQ: Combining results other than coefficients in e(b) with multiply imputed data

Home / Resources & support / FAQs / Combining results other than coefficients in e(b) with multiply imputed data

How can I combine results other than coefficients in e(b) with multiply imputed data?

Title		Combining results other than coefficients in e(b) with multiply imputed data
Authors		Isabel Canette and Yulia Marchenko, StataCorp

mi estimate estimates parameters from multiply imputed data and adjusts these estimates and their respective standard errors for the imputation uncertainty using Rubin’s combination rules. mi estimate is designed to work with Stata estimation commands. As such, it combines the estimates of coefficients, which are stored in matrix e(b), and their respective variance–covariance estimates (VCE), stored in matrix e(V). You may also want to combine results other than e(b). The aim of this tutorial is to show you how to do that.

This entry is organized as follows:

Finding out where your results are saved
Combining point estimates only
Combining point estimates and their variances
An example using suest

Finding out where your results are saved

First, you need to know where your results are saved. Most Stata commands save results either in e() or r(). You can type

. ereturn list

. return list

to find out where your value of interest is saved after running a command. Alternatively, you can look at the Stored results section in the help file or documentation for the command.

Combining point estimates only

In some situations, we are interested in a point estimate, such as the R-squared (\(R^2\)) value from a regression, not in its variance. According to Rubin's rules, the estimate of the value of interest should be computed for each imputation, and the overall value will be the mean of these estimates. We can do this manually, taking advantage of mi xeq, which allows you to run sequences of commands of interest on each individual imputation.

In the following example, results are combined for the \(R^2\) value from a linear regression. We use this example strictly for the purpose of illustration. See the unofficial command mibeta (type . search mibeta to locate and install this command), which automatically provides \(R^2\), adjusted \(R^2\), and standardized coefficients after regression for imputed data.

We will use the dataset from the example of house resale prices in [MI] mi estimate. Our first step is to identify where our result of interest is stored. The regress command is an e-class command and it stores \(R^2\) in e(r2):

. webuse mhouses1993s30
(Albuquerque home prices Feb15–Apr30, 1993)

. mi xeq 0: regress price tax sqft age nfeatures ne custom corner

m=0 data:
-> regress price tax sqft age nfeatures ne custom corner


      Source         SS           df       MS      Number of obs   =        66
      F(7, 58)        =     51.86
       Model    9164658.61         7  1309236.94    Prob > F        =    0.0000
    Residual    1464105.15        58  25243.1922    R-squared       =    0.8623
      Adj R-squared   =    0.8456
       Total    10628763.8        65  163519.442    Root MSE        =    158.88



       price   Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
   
         tax     .4988701   .1584853     3.15   0.003     .1816273    .8161128
        sqft     .3522184   .0957476     3.68   0.001     .1605588    .5438779
         age    -.5650817   2.002529    -0.28   0.779     -4.57358    3.443416
   nfeatures     4.389607   18.55499     0.24   0.814    -32.75223    41.53145
          ne    -17.38534   47.27462    -0.37   0.714    -112.0158     77.2451
      custom     174.9411   53.72371     3.26   0.002     67.40139    282.4808
      corner    -73.58234   49.13007    -1.50   0.140    -171.9269    24.76218
       _cons      92.7448    101.607     0.91   0.365    -110.6438    296.1334



. ereturn list

scalars:
                  e(N) =  66
               e(df_m) =  7
               e(df_r) =  58
                  e(F) =  51.86495180785406
                 e(r2) =  .8622506644760125
		(output omitted)

We used mi xeq 0: in the command line above to execute the regress command on the original data m=0.

Now we want to compute the average of the \(R^2\) values. To do so, we can use mi xeq: to run regressions on each imputed dataset and collect individual \(R^2\) values:

. mi query
data mi set mlong, M = 30

. local M = r(M)

. scalar r2 = 0

. mi xeq 1/`M': regress price tax sqft age nfeatures ne custom corner; 
     scalar r2 = r2 + e(r2)

m=1 data:
-> regress price tax sqft age nfeatures ne custom corner


      Source         SS           df       MS      Number of obs   =       117
      F(7, 109)       =     74.79
       Model    13895850.5         7   1985121.5    Prob > F        =    0.0000
    Residual    2893096.28       109  26542.1677    R-squared       =    0.8277
      Adj R-squared   =    0.8166
       Total    16788946.8       116    144732.3    Root MSE        =    162.92



       price   Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
   
         tax      .676841   .1140902     5.93   0.000      .450718    .9029641
        sqft     .2091065    .065971     3.17   0.002     .0783542    .3398589
         age    -.2350972   1.472105    -0.16   0.873    -3.152762    2.682568
   nfeatures      4.17071     12.572     0.33   0.741    -20.74658      29.088
          ne      14.4082    33.3729     0.43   0.667    -51.73581    80.55221
      custom     137.1073   41.98151     3.27   0.001     53.90127    220.3132
      corner    -75.24024    39.1843    -1.92   0.057    -152.9023    2.421776
       _cons     149.6721   66.01572     2.27   0.025     18.83105    280.5131

-> scalar r2 = r2 + e(r2)

...
(output omitted)
...

m=30 data:
-> regress price tax sqft age nfeatures ne custom corner


      Source         SS           df       MS      Number of obs   =       117
      F(7, 109)       =     71.60
       Model    13790079.1         7   1970011.3    Prob > F        =    0.0000
    Residual    2998867.71       109  27512.5478    R-squared       =    0.8214
      Adj R-squared   =    0.8099
       Total    16788946.8       116    144732.3    Root MSE        =    165.87



       price   Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
   
         tax     .6938792   .1261562     5.50   0.000     .4438417    .9439168
        sqft     .2044596   .0710267     2.88   0.005     .0636869    .3452322
         age     .3372971   1.591275     0.21   0.833    -2.816558    3.491152
   nfeatures     6.940631   13.04224     0.53   0.596    -18.90865    32.78992
          ne     7.280361   36.79407     0.20   0.844     -65.6443    80.20502
      custom      132.975   43.24393     3.07   0.003     47.26696    218.6831
      corner    -70.26249   39.74756    -1.77   0.080    -149.0409    8.515876
       _cons     128.2627   67.20399     1.91   0.059    -4.933393    261.4588

-> scalar r2 = r2 + e(r2)

. scalar r2 = r2/`M'

. di as txt "R2 over imputed data = " as res r2
R2 over imputed data = .82300853

We used mi query above to identify the number of imputations and to store it in the local macro M. Then we used mi xeq 1/`M': to run regressions on the imputed data and accumulate the \(R^2\) values in the r2 scalar. We used a semicolon to separate the two commands we wished to execute on each imputed datum; see [MI] mi xeq for details. Finally, to obtain the average, we divided r2 by the number of imputations.

We recommend applying Rubin’s combination rules to parameters in a metric for which the asymptotic normal approximation works well. Fisher’s z, or inverse hyperbolic tangent [atanh()], transformation is often recommended for the correlation coefficient \(R\) to improve its asymptotic normality. Thus we can combine the \(R^2\) values in the transformed metric and then use the inverse transformation, tanh(), to switch back to the original metric.

. local M = 30

. scalar r2 = 0

. qui mi xeq 1/`M': regress price tax sqft age nfeatures ne custom corner; scalar 
     r2 = r2 + atanh(sqrt(e(r2)))

. scalar r2 = tanh(r2/`M')^2

. di as txt "R2 using Fisher's z over imputed data = " as res r2
R2 using Fisher's z over imputed data = .82306058

The two estimates of the \(R^2\) are very similar.

Combining point estimates and their variances

If we want to combine point estimates and their variances, we can use mi estimate. However, we need to create an e-class program (see program) that saves the necessary results where mi estimate expects to see them. The general guidelines are described in program properties.

Below we demonstrate how to use mi estimate to combine results from r-class command roctab using the heart attack example from [MI] mi estimate. The roctab command posts the estimate of the area under the curve (AUC) in r(area) and its standard error in r(se):

. webuse mheart1s20
(Fictional heart attack data; BMI missing)

. qui mi passive: generate high_bmi = (bmi>30) if bmi<.

. mi xeq 0: roctab attack high_bmi

m=0 data:
-> roctab attack high_bmi


                      ROC                     Asymptotic normal  
           Obs       area     Std. err.      [95% conf. interval]
     
           132     0.5452       0.0322        0.48208     0.60833


. return list

scalars:
                  r(N) =  132
               r(area) =  .5452035889029503
                 r(se) =  .0322055279124565
                 r(lb) =  .4820819140914361
                 r(ub) =  .6083252637144645
              r(level) =  95

In the above series of commands, we created a new variable indicating high body mass index (BMI) values, high_bmi, which we use as a classification variable in the receiver operating characteristic (ROC) analysis of heart attacks. Because bmi is the imputed variable, high_bmi (being a function of bmi) is a passive variable. As such, we used mi passive: generate to create the passive variable high_bmi; see mi register and mi passive for details.

Following the guidelines in program properties, we need to create an e-class program that stores the AUC estimate and its standard error in e(b) and e(V). We also need to store other results, such as the name of the command in e(cmd), the number of observations in e(N), and the title of the command in e(title).

In the following code, the e-class program eroctab is defined. eroctab calls roctab and posts necessary results to e(); see ereturn for details. This program accepts two arguments: the name of the reference variable (stored in the local macro refvar) and the name of the classification variable (stored in the local macro classvar). In step 1, we perform ROC analysis of variables in refvar and classvar using roctab. In step 2, we save the estimates of the area and its variance in temporary matrices b and V, respectively. Then in step 3, we label columns of coefficient matrix `b' and rows and columns of the VCE matrix `V' consistently, as required by ereturn post. Finally, in step 4, we post coefficient and VCE matrices as well as other results to e().

cap program drop eroctab
program eroctab, eclass
        version 19.0

        /* Step 1: perform ROC analysis */
        args refvar classvar
        roctab `refvar' `classvar'

        /* Step 2: save estimate and its variance in temporary matrices*/
        tempname b V
        mat `b' = r(area)
        mat `V' = r(se)^2
        local N = r(N)

        /* Step 3: make column names and row names consistent*/
        mat colnames `b' = AUC
        mat colnames `V' = AUC
        mat rownames `V' = AUC

        /*Step 4: post results to e()*/
        ereturn post `b' `V', obs(`N')
        ereturn local cmd "eroctab"
        ereturn local title "ROC area"
end

Inference about the AUC estimate in roctab is based on the large-sample normal approximation. Sometimes commands, for example regress, adjust for small samples and use Student’s t distribution for inference. In such cases, the corresponding (residual or denominator) degrees of freedom must be posted to e(df_r). This can be done by specifying the dof() option with ereturn post.

Now we can use eroctab with the mi estimate: prefix to obtain multiple-imputation estimates of the ROC area:

. mi estimate, cmdok: eroctab attack high_bmi


Multiple-imputation estimates                   Imputations       =         20
ROC area                                        Number of obs     =        154
                                                Average RVI       =     0.0753
                                                Largest FMI       =     0.0705
DF adjustment:   Large sample                   DF:     min       =   3,873.11
                                                        avg       =   3,873.11
                                                        max       =   3,873.11



               Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
   
         AUC     .5450938   .0312929    17.42   0.000     .4837416    .6064459

In the command line above, we used the cmdok option with mi estimate because eroctab is not one of the estimation commands officially supported by mi estimate. Alternatively, we could have included the mi property in the definition of the program eroctab to use it with mi estimate directly:

cap program drop eroctab
program eroctab, eclass properties(mi)
        ...
end

. mi estimate: eroctab attack high_bmi

An example using suest

Another situation when you need to define your own program is when your results are not obtained via a single command, but by using a sequence of commands such as, for example, with suest.

Below we demonstrate an example in which we want to compare the effects of smoking on heart attacks from two logistic models: one adjusting for both age and BMI and the other adjusting only for BMI.

First, we create a new program, mysuest, which fits two models of interest and then uses suest to combine the two estimation results. The models of interest are passed as two arguments and are stored in local macros model1 and model2. To avoid specifying the cmdok option with mi estimate, we add properties(mi) to the definition of the mysuest program.

cap program drop mysuest
program mysuest, eclass properties(mi)
        version 19.0
	args model1 model2

        qui `model1'
        estimates store est1
        qui `model2'
        estimates store est2
        suest est1 est2
        estimates drop est1 est2
        
        ereturn local title "Seemingly unrelated estimation"
end

Because suest is an estimation command and already stores the appropriate results in e(), we did not need to manually post (or repost) anything to e() except e(title), which is not stored by suest.

We can now use mysuest with mi estimate:

. webuse mheart1s20, clear
(Fictional heart attack data; BMI missing)

. mi estimate: mysuest "logit attack smokes age bmi" "logit attack smokes bmi"

Multiple-imputation estimates                   Imputations       =         20
Seemingly unrelated estimation                  Number of obs     =        154
                                                Average RVI       =     0.0541
                                                Largest FMI       =     0.1568
DF adjustment:   Large sample                   DF:     min       =     794.17
                                                        avg       = 234,946.42
Within VCE type:       Robust                           max       = 667,104.68



               Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
   
est1_attack                                                                   
      smokes     1.187115   .3574611     3.32   0.001      .486502    1.887728
         age     .0352896   .0162749     2.17   0.030     .0033913     .067188
         bmi      .102868   .0478936     2.15   0.032     .0089024    .1968336
       _cons    -5.312597    1.69795    -3.13   0.002    -8.641634   -1.983561
   
est2_attack                                                                   
      smokes     1.172281   .3496687     3.35   0.001     .4869414     1.85762
         bmi     .0921183   .0470325     1.96   0.051    -.0002044    .1844411
       _cons    -3.038422   1.229553    -2.47   0.014    -5.451692   -.6251518

Recall that within multiple-imputation framework, to test the equality of coefficients, we must first estimate their difference and then use mi testtransform to test the hypothesis; see [MI] mi test for details.

So, to test the coefficients on smokes from the two logistic models, we first estimate their difference with mi estimate. To display only transformed results, we specify the nocoef option:

. mi estimate (diff: [est1_attack]smokes - [est2_attack]smokes), nocoef: 
     mysuest "logit attack smokes age bmi" "logit attack smokes bmi"

Multiple-imputation estimates                   Imputations       =         20
Seemingly unrelated estimation                  Number of obs     =        154
                                                Average RVI       =     0.0047
                                                Largest FMI       =     0.0047
DF adjustment:   Large sample                   DF:     min       = 866,779.11
                                                        avg       = 866,779.11
Within VCE type:       Robust                           max       = 866,779.11

         diff: [est1_attack]smokes - [est2_attack]smokes



               Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
   
        diff     .0148342    .079597     0.19   0.852    -.1411732    .1708417

Although the t test, which is automatically reported by mi estimate, is sufficient for testing the hypothesis of no difference between the two coefficient estimates of smokes, we can also use the following mi testtransform command to test this hypothesis:

. mi testtransform diff
note: assuming equal fractions of missing information.

         diff: [est1_attack]smokes - [est2_attack]smokes

 ( 1)  diff = 0

       F(  1,866779.1) =    0.03
            Prob > F =    0.8522

We do not have sufficient evidence to reject the null hypothesis of the equality of the smoking effects from the considered logistic models.

We could easily extend our mysuest program to allow more than two models to be combined by suest.

How can I combine results other than coefficients in e(b) with multiply imputed data?

Finding out where your results are saved

Combining point estimates only

Combining point estimates and their variances

An example using suest

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Source	SS df MS	Number of obs = 66
		F(7, 58) = 51.86
Model	9164658.61 7 1309236.94	Prob > F = 0.0000
Residual	1464105.15 58 25243.1922	R-squared = 0.8623
		Adj R-squared = 0.8456
Total	10628763.8 65 163519.442	Root MSE = 158.88


price		Coefficient Std. err. t P>\|t\| [95% conf. interval]

tax		.4988701 .1584853 3.15 0.003 .1816273 .8161128
sqft		.3522184 .0957476 3.68 0.001 .1605588 .5438779
age		-.5650817 2.002529 -0.28 0.779 -4.57358 3.443416
nfeatures		4.389607 18.55499 0.24 0.814 -32.75223 41.53145
ne		-17.38534 47.27462 -0.37 0.714 -112.0158 77.2451
custom		174.9411 53.72371 3.26 0.002 67.40139 282.4808
corner		-73.58234 49.13007 -1.50 0.140 -171.9269 24.76218
_cons		92.7448 101.607 0.91 0.365 -110.6438 296.1334

Source	SS df MS	Number of obs = 117
		F(7, 109) = 74.79
Model	13895850.5 7 1985121.5	Prob > F = 0.0000
Residual	2893096.28 109 26542.1677	R-squared = 0.8277
		Adj R-squared = 0.8166
Total	16788946.8 116 144732.3	Root MSE = 162.92


price		Coefficient Std. err. t P>\|t\| [95% conf. interval]

tax		.676841 .1140902 5.93 0.000 .450718 .9029641
sqft		.2091065 .065971 3.17 0.002 .0783542 .3398589
age		-.2350972 1.472105 -0.16 0.873 -3.152762 2.682568
nfeatures		4.17071 12.572 0.33 0.741 -20.74658 29.088
ne		14.4082 33.3729 0.43 0.667 -51.73581 80.55221
custom		137.1073 41.98151 3.27 0.001 53.90127 220.3132
corner		-75.24024 39.1843 -1.92 0.057 -152.9023 2.421776
_cons		149.6721 66.01572 2.27 0.025 18.83105 280.5131


price		Coefficient Std. err. t P>\|t\| [95% conf. interval]

tax		.6938792 .1261562 5.50 0.000 .4438417 .9439168
sqft		.2044596 .0710267 2.88 0.005 .0636869 .3452322
age		.3372971 1.591275 0.21 0.833 -2.816558 3.491152
nfeatures		6.940631 13.04224 0.53 0.596 -18.90865 32.78992
ne		7.280361 36.79407 0.20 0.844 -65.6443 80.20502
custom		132.975 43.24393 3.07 0.003 47.26696 218.6831
corner		-70.26249 39.74756 -1.77 0.080 -149.0409 8.515876
_cons		128.2627 67.20399 1.91 0.059 -4.933393 261.4588

	ROC Asymptotic normal
	Obs area Std. err. [95% conf. interval]

	132 0.5452 0.0322 0.48208 0.60833


		Coefficient Std. err. t P>\|t\| [95% conf. interval]

AUC		.5450938 .0312929 17.42 0.000 .4837416 .6064459


		Coefficient Std. err. t P>\|t\| [95% conf. interval]

est1_attack
smokes		1.187115 .3574611 3.32 0.001 .486502 1.887728
age		.0352896 .0162749 2.17 0.030 .0033913 .067188
bmi		.102868 .0478936 2.15 0.032 .0089024 .1968336
_cons		-5.312597 1.69795 -3.13 0.002 -8.641634 -1.983561

est2_attack
smokes		1.172281 .3496687 3.35 0.001 .4869414 1.85762
bmi		.0921183 .0470325 1.96 0.051 -.0002044 .1844411
_cons		-3.038422 1.229553 -2.47 0.014 -5.451692 -.6251518


		Coefficient Std. err. t P>\|t\| [95% conf. interval]

diff		.0148342 .079597 0.19 0.852 -.1411732 .1708417

Stata/MP4 Annual License (download)

How can I combine results other than coefficients in e(b) with multiply imputed data?

Finding out where your results are saved

Combining point estimates only

Combining point estimates and their variances

An example using suest

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies