Bayesian model averaging (BMA)

Bayesian model averaging (BMA) for linear regression

Order

Watch video demo

<- See Stata 18's new features

Highlights

Model choice, inference, and prediction
Model enumeration and MC3 sampling
Always-included and grouped predictors
Model priors: uniform, binomial, beta-binomial
Many g-priors, including hyper-g and robust
Factor variables and time-series operators
Strong, weak, or no heredity for interactions
BMA MCMC convergence
Posterior model and inclusion probabilities
Prior and posterior model-size distributions
Posterior distributions of model parameters
Jointness measures for pairs of predictors
Variable-inclusion maps
Log predictive-score for model fit and prediction performance
Sensitivity analysis
Prediction
Standard Bayesian postestimation features support
See more Bayesian model averaging features
See more Bayesian analysis features

Why choose just one model when you can borrow information from many? The new bma suite performs Bayesian model averaging to account for model uncertainty in your analysis. Uncertain which predictors to include in your linear regression model? Use bmaregress to find out which predictors are important. Perform model choice, inference, and prediction. Use many postestimation commands to explore influential models, model complexity, model fit and predictive performance, sensitivity analysis to the assumptions about importance of models and predictors, and more.

-> Introduction

-> The bma suite

-> Let's see it work

-> Tell me more

Introduction

Traditionally, we select a model and perform inference and prediction conditional on this model. Our results typically do not account for the uncertainty in selecting a model and thus may be overly optimistic. They may even be incorrect if our selected model is substantially different from the true data-generating model (DGM). In some applications, we may have strong theoretical or empirical evidence about the DGM. In other applications, usually of complex and unstable natures, such as those in economics, psychology, and epidemiology, choosing one reliable model can be difficult.

Instead of relying on just one model, model averaging averages results over multiple plausible models based on the observed data. In BMA, the "plausibility" of the model is described by the posterior model probability (PMP), which is determined using the fundamental Bayesian principles—the Bayes theorem—and applied universally to all data analyses.

BMA can be used to account for model uncertainty when estimating model parameters and predicting new observations to avoid overly optimistic conclusions. It is particularly useful in applications with several plausible models, where there is no one definitive reason to choose a particular model over the others. But even if choosing a single model is the end goal, you can find BMA beneficial. It provides a principled way to identify important models and predictors within the considered classes of models. Its framework allows you to learn about interrelations between different predictors in terms of their tendency to appear in a model together, separately, or independently. It can be used to evaluate the sensitivity of the final results to various assumptions about the importance of different models and predictors. And it provides optimal predictions in the log-score sense.

The bma suite

In a regression setting, model uncertainty amounts to the uncertainty of which predictors should be included in a model. We can use bmaregress to account for selection of predictors in a linear regression. It explores the model space either exhaustively with the enumeration option, whenever feasible, or by using the Markov chain Monte Carlo (MCMC) model composition (MC3) algorithm with the sampling option. It reports various summaries of the visited models and included predictors and of posterior distributions of model parameters. It allows you to specify groups of predictors to be included or excluded together from a model and those that are included in all models. It provides various prior distributions for models in the mprior() option and for the g parameter, which controls shrinkage of regression coefficients toward zero, in the gprior() option. It also supports factor variables and time-series operators and provides several ways for handling interactions during estimation by using the heredity() option.

There are many supported postestimation features, which also include some of the standard Bayesian postestimation features.

Command	Description
bmacoefsample	posterior samples of regression coefficients

bmagraph pmp	model-probability plots
bmagraph msize	model-size distribution plots
bmagraph varmap	variable-inclusion maps
bmagraph coefdensity	coefficient posterior density plots

bmastats models	posterior model and variable-inclusion summaries
bmastats msize	model-size summary
bmastats pip	posterior inclusion probabilities (PIPs) for predictors
bmastats jointness	jointness measures for predictors
bmastats lps	log predictive-score (LPS)

bmapredict	BMA predictions

bayesgraph	Bayesian graphical summaries and convergence diagnostics
bayesstats summary	Bayesian summary statistics for model parameters and their functions
bayesstats ess	Bayesian effective sample sizes and related statistics
bayesstats ppvalues	Bayesian predictive p-values
bayespredict	Bayesian predictions

Let's see it work

-> Model enumeration

-> Credible intervals

-> Influential models

-> Important predictors

-> Model-size distribution

-> Posterior distributions of coefficients

-> BMA prediction

-> Informative model priors

-> Predictive performance using LPS

-> Cleanup

Below, we provide a tour of some bma features using a toy dataset. This dataset contains \(n=200\) observations, \(p=10\) orthogonal predictors, and outcome y generated as

\({\mathtt y} = {0.5} + {1.2}\times{\mathtt x2} + {5}\times{\mathtt x10} + \epsilon\)

where \(\epsilon\sim N(0,1)\) is a standard normal error term.

. webuse bmaintro
(Simulated data for BMA example)

. summarize


    Variable          Obs        Mean    Std. dev.       Min        Max

           y          200    .9944997    4.925052    -13.332   13.06587
          x1          200   -.0187403    .9908957  -3.217909   2.606215
          x2          200   -.0159491    1.098724  -2.999594   2.566395
          x3          200     .080607    1.007036  -3.016552   3.020441
          x4          200    .0324701    1.004683  -2.410378   2.391406

          x5          200   -.0821737    .9866885  -2.543018   2.133524
          x6          200    .0232265    1.006167  -2.567606   3.840835
          x7          200   -.1121034    .9450883  -3.213471   1.885638
          x8          200   -.0668903    .9713769  -2.871328   2.808912
          x9          200   -.1629013    .9550258  -2.647837   2.472586

         x10          200     .083902    .8905923  -2.660675   2.275681

Model enumeration

We use bmaregress to fit a BMA linear regression of y on x1 through x10.

. bmaregress y x1-x10

Enumerating models ...
Computing model probabilities ...

Bayesian model averaging                            No. of obs         =    200
Linear regression                                   No. of predictors  =     10
Model enumeration                                               Groups =     10
                                                                Always =      0
Priors:                                             No. of models      =  1,024
  Models: Beta-binomial(1, 1)                           For CPMP >= .9 =      9
   Cons.: Noninformative                            Mean model size    =  2.479
   Coef.: Zellner's g
       g: Benchmark, g = 200                        Shrinkage, g/(1+g) = 0.9950
  sigma2: Noninformative                            Mean sigma2        =  1.272



           y         Mean   Std. dev.                          Group        PIP

          x2    1.198105   .0733478                               2          1
         x10     5.08343   .0900953                              10          1
          x3   -.0352493   .0773309                               3     .21123
          x9     .004321   .0265725                               9    .051516
          x1    .0033937   .0232163                               1    .046909
          x4   -.0020407   .0188504                               4    .039267
          x5    .0005972   .0152443                               5    .033015
          x9   -.0005639   .0153214                               8    .032742
          x7   -8.23e-06    .015497                               7    .032386
          x6   -.0003648   .0143983                               6    .032361

Always       
       _cons    .5907923   .0804774                               0          1

Note: Coefficient posterior means and std. dev. estimated from 1,024 models.
Note: Default priors are used for models and parameter g.

With 10 predictors and the default fixed (benchmark) g-prior, bmaregress uses model enumeration and explores all \(2^{10}=1,024\) possible models. The default model prior is beta-binomial with shape parameters of 1, which is uniform over the model size. A priori, our BMA model assumed little shrinkage of regression coefficients toward zero because \(g/(1+g) = 0.9950\) is close to 1.

bmaregress identified x2 and x10 as highly important predictors—their posterior inclusion probabilities (PIPs) are essentially 1. The posterior mean estimates of their regression coefficients are, respectively, 1.2 (rounded) and 5.1 and are very close to the true values of 1.2 and 5. All other predictors have much lower PIPs and coefficient estimates close to zero. Our BMA findings are consistent with the true DGM.

Let's store our current estimation results for later use. As with other Bayesian commands, we must save the simulation results first before we can use estimates store to store the estimation results.

. bmaregress, saving(bmareg)
note: file bmareg.dta saved.

. estimates store bmareg

Credible intervals

bmaregress does not report credible intervals (CrIs) by default, because it requires a potentially time-consuming simulation of the posterior sample of model parameters. But we can compute them after estimation.

We first use bmacoefsample to generate an MCMC sample from the posterior distribution of model parameters, which include regression coefficients. We then use the existing bayesstats summary command to compute posterior summaries of the generated MCMC sample.

. bmacoefsample, rseed(18)

Simulation (10000): ....5000....10000 done

. bayesstats summary

Posterior summary statistics                      MCMC sample size =    10,000



                                                              Equal-tailed
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]

y            
          x1   .0031927   .0234241   .000234          0          0   .0605767
          x2   1.197746   .0726358   .000735   1.197211   1.053622   1.341076
          x3  -.0361581   .0780037    .00078          0  -.2604694          0
          x4  -.0021015    .018556   .000186          0  -.0306376          0
          x5   .0004701   .0147757   .000148          0          0          0
          x6  -.0003859   .0140439   .000142          0          0          0
          x7  -.0003311   .0166303   .000166          0          0          0
          x8  -.0005519   .0145717    .00015          0          0          0
          x9   .0046535   .0273899   .000274          0          0    .096085
         x10    5.08357   .0907759   .000927   5.083466    4.90354   5.262716
       _cons   .5901334   .0811697   .000801   .5905113   .4302853   .7505722

      sigma2   1.272579   .1300217     .0013   1.262612   1.043772   1.555978
           g        200          0         0        200        200        200

The 95% CrI for the coefficient of x2 is [1.05, 1.34], and the one for x10 is [4.9, 5.26], which is consistent with our DGM.

Influential models

The BMA coefficient estimates are averaged over 1,024 models. It is important to investigate which models are influential. We can use bmastats models to explore the PMPs.

. bmastats models

Model summary         Number of models:
                               Visited = 1,024
                              Reported =     5



                 Analytical PMP     Model size

Rank         
           1             .6292              2
           2             .1444              3
           3             .0258              3
           4             .0246              3
           5            .01996              3



Variable-inclusion summary

                 Rank      Rank      Rank      Rank      Rank
                   1         2         3         4         5 

          x2      x         x         x         x         x
         x10      x         x         x         x         x
          x3                x                              
          x9                          x                    
          x1                                    x          
          x4                                              x

Legend: 
x - estimated

Not surprisingly, the model that contains both x2 and x10 has the highest PMP of about 63%. In fact, the top two models correspond to a cumulative PMP of about 77%:

. bmastats models, cumulative(0.75)

Computing model probabilities ...

Model summary           Number of models:
                                 Visited = 1,024
                                Reported =     2



                  Analytical CPMP     Model size

Rank         
           1               .6292              2
           2               .7736              3



Variable-inclusion summary

                 Rank      Rank
                   1         2 

          x2      x         x  
         x10      x         x  
          x3                x  

Legend: 
x - estimated

We can use bmagraph pmp to plot the cumulative PMP.

. bmagraph pmp, cumulative

The command plots the first 100 models by default, but you can specify the top() option if you would like to see more models.

Important predictors

We can explore the importance of predictors visually by using bmagraph varmap to produce a variable-inclusion map.

. bmagraph varmap

Computing model probabilities ...

x2 and x10 are included in all top 100 models, ranked by PMP. Their coefficients are positive in all models.

Model-size distribution

We can explore the complexity of our BMA model by using bmastats msize and bmagraph msize to explore the prior and posterior model-size distributions.

. bmastats msize

Model-size summary

Number of models = 1,024
Model size:
  Minimum =  0
  Maximum = 10



                         Mean     Median

Prior                                  
  Analytical          5.0000          5

Posterior                              
  Analytical          2.4794          2

Note: Frequency summaries not available.

. bmagraph msize

The prior model-size distribution is uniform over the model size. The posterior model-size distribution is skewed to the left with the mode of about 2. The prior mean model size is 5 and the posterior one is 2.48. So, based on the observed data, BMA tends to favor smaller models with roughly two predictors, on average, compared with our a priori assumption.

Posterior distributions of coefficients

We can use bmagraph coefdensity to plot posterior distributions of regression coefficients.

. bmagraph coefdensity {x2} {x3}, combine

These distributions are mixtures of a point mass at zero corresponding to the predictor's probability of noninclusion and a continuous density conditional on the inclusion. For the coefficient of x2, the probability of noninclusion is negligible, so its posterior distribution is essentially a continuous, fairly symmetric density centered at 1.2. For the coefficient of x3, in addition to the conditional continuous density, there is a red vertical line at zero corresponding to the posterior noninclusion probability of roughly 1-.2 = 0.8.

BMA prediction

bmapredict produces various BMA predictions. For instance, let's compute the posterior predictive means.

. bmapredict pmean, mean
note: computing analytical posterior predictive means.

We can also compute predictive CrIs to estimate uncertainty about our predictions. (This is not available with many traditional model-selection methods.) Note that these predictive CrIs also incorporate model uncertainty. To compute predictive CrIs, we must save the posterior MCMC model-parameter sample first.

. bmacoefsample, saving(bmacoef)
note: saving existing MCMC simulation results without resampling; specify
      option simulate to force resampling in this case.
note: file bmacoef.dta saved.

. bmapredict cri_l cri_u, cri rseed(18)
note: computing credible intervals using simulation.

Computing predictions ...

We summarize the predicted results:

.  summarize y pmean cri*



    Variable          Obs        Mean    Std. dev.       Min        Max

           y         200    .9944997    4.925052    -13.332   13.06587
       pmean         200    .9944997    4.783067  -13.37242   12.31697
       cri_l         200    -1.24788    4.787499  -15.66658   10.03054
       cri_u         200    3.227426    4.779761  -11.06823   14.58301

The reported averages over observations for predictive means and lower and upper 95% predictive CrI bounds look reasonable relative to the observed outcome y.

Informative model priors

One of the strengths of BMA is the ability to incorporate prior information about models and predictors. This allows us to investigate the sensitivity of the results to various assumptions about the importance of models and predictors. Suppose that we have reliable information that all predictors except x2 and x10 are less likely to be related to the outcome. We can specify, for example, a binomial model prior with low prior inclusion probabilities for these predictors.

To compare the out-of-sample performance of the BMA models later, we randomly split our sample into two parts and fit the BMA model using the first subsample. We also save the results from this more informative BMA model.

. splitsample, generate(sample) nsplit(2) rseed(18)

. bmaregress y x1-x10 if sample == 1, mprior(binomial x2 x10 0.5 x1 x3-x9 0.05)
  saving(bmareg_inf)

Enumerating models ...
Computing model probabilities ...

Bayesian model averaging                          No. of obs         =    100
Linear regression                                 No. of predictors  =     10
Model enumeration                                             Groups =     10
                                                              Always =      0
Priors:                                           No. of models      =  1,024
  Models: Binomial, IP varies                         For CPMP >= .9 =      1
   Cons.: Noninformative                          Mean model size    =  2.072
   Coef.: Zellner's g
       g: Benchmark, g = 100                      Shrinkage, g/(1+g) = 0.9901
  sigma2: Noninformative                          Mean sigma2        =  1.268



           y        Mean   Std. dev.                          Group        PIP

          x2    1.168763   .1031096                               2          1
         x10    4.920726    .124615                              10          1
          x1    .0019244   .0204242                               1    .013449
          x5   -.0018262   .0210557                               5    .011973
          x3   -.0017381   .0205733                               3    .011557
          x4   -.0015444   .0193858                               4    .010709

Always        
       _cons    .5637673    .113255                               0          1

Note: Coefficient posterior means and std. dev. estimated from 1,024 models.
Note: Default prior is used for parameter g.
Note: 4 predictors with PIP less than .01 not shown.

file bmareg_inf.dta saved.

. estimates store bmareg_inf

The effect of this model prior is that the PIP of all unimportant predictors is now less than 2%.

Note that when we select one model, we are essentially fitting a BMA model with a very strong prior that all selected predictors must be included with a probability of 1. For instance, we can force bmaregress to include all variables in the model:

. bmaregress y (x1-x10, always) if sample == 1, saving(bmareg_all)

Enumerating models ...
Computing model probabilities ...

Bayesian model averaging                           No. of obs         =    100
Linear regression                                  No. of predictors  =     10
Model enumeration                                              Groups =      0
                                                               Always =     10
Priors:                                            No. of models      =      1
  Models: Beta-binomial(1, 1)                          For CPMP >= .9 =      1
   Cons.: Noninformative                           Mean model size    = 10.000
   Coef.: Zellner's g
       g: Benchmark, g = 100                       Shrinkage, g/(1+g) = 0.9901
  sigma2: Noninformative                           Mean sigma2        =  1.192



           y         Mean   Std. dev.                          Group        PIP

Always                                                                         
          x1     .1294521    .105395                               0          1
          x2     1.166679   .1129949                               0          1
          x3    -.1433074   .1271903                               0          1
          x4    -.1032189   .1223152                               0          1
          x5    -.0819008   .1261309                               0          1
          x6     .0696633   .1057512                               0          1
          x7     .0222949   .1215404                               0          1
          x8    -.0252311   .1124352                               0          1
          x9     .0587412   .1166921                               0          1
         x10     4.949992   .1276795                               0          1
       _cons     .6006153   .1127032                               0          1

Note: Coefficient posterior means and std. dev. estimated from 1 model.
Note: Default priors are used for models and parameter g.

file bmareg_all.dta saved.

. estimates store bmareg_all

Predictive performance using LPS

For comparison of the considered BMA models, we refit our default BMA model using the first subsample and store the estimation results.

. qui bmaregress y x1-x10 if sample == 1, saving(bmareg, replace)
. estimates store bmareg

We can compare the out-of-sample predictive performance of our BMA models by using bmastats lps to compute the log predictive-score (LPS) for out-of-sample observations.

. bmastats lps bmareg bmareg_inf bmareg_all if sample == 2, compact

Log predictive-score (LPS)

Number of observations = 100



         LPS        Mean    Minimum    Maximum

      bmareg    1.562967   1.039682   6.778834
  bmareg_inf    1.562238   1.037576   6.883794
  bmareg_all    1.576231   1.032793   6.084414

Notes: Using analytical PMPs.
       Result bmareg_inf has the smallest mean LPS.

The more informative bmareg_inf model has a slightly smaller mean LPS, but the LPS summaries for all models are very similar. See [BMA] bmastats lps for how to compare the BMA model performance by using cross-validation.

Cleanup

We generated several datasets during our analysis. We no longer need them, so we erase them at the end. But you may decide to keep them, especially if they correspond to a potentially time-consuming final analysis.

. erase bmareg.dta
. erase bmacoef.dta
. erase bmareg_inf.dta
. erase bmareg_all.dta

Tell me more

See Motivating examples in [BMA] Intro for more detailed analyses of the above toy dataset and comparison of BMA results to linear regression, stepwise selection, and lasso, as well as for the use of BMA with \(n\lt=p\). In [BMA] bmaregress, for a full tour of bma, see Getting started examples; for model choice and inference, see BMA analysis of cross-country economic growth data; and for predictions, see BMA predictive performance for the USA crime rate data and Remarks and examples of [BMA] bmapredict.

Read more about Bayesian model averaging in the Stata Bayesian Model Averaging Reference Manual.

View all the new features in Stata 18 and, in particular, New in Bayesian analysis.

Made for data science.

Get started today.

Order

Upgrade