<- See Stata 18's new features
Highlights
Model choice, inference, and prediction
Model enumeration and MC3 sampling
Always-included and grouped predictors
Model priors: uniform, binomial, beta-binomial
Many g-priors, including hyper-g and robust
Factor variables and time-series operators
Strong, weak, or no heredity for interactions
BMA MCMC convergence
Posterior model and inclusion probabilities
Prior and posterior model-size distributions
Posterior distributions of model parameters
Jointness measures for pairs of predictors
Variable-inclusion maps
Log predictive-score for model fit and prediction performance
Sensitivity analysis
Prediction
Standard Bayesian postestimation features support
See more Bayesian model averaging features
See more Bayesian analysis features
Why choose just one model when you can borrow information from many? The new bma suite performs Bayesian model averaging to account for model uncertainty in your analysis. Uncertain which predictors to include in your linear regression model? Use bmaregress to find out which predictors are important. Perform model choice, inference, and prediction. Use many postestimation commands to explore influential models, model complexity, model fit and predictive performance, sensitivity analysis to the assumptions about importance of models and predictors, and more.
Traditionally, we select a model and perform inference and prediction conditional on this model. Our results typically do not account for the uncertainty in selecting a model and thus may be overly optimistic. They may even be incorrect if our selected model is substantially different from the true data-generating model (DGM). In some applications, we may have strong theoretical or empirical evidence about the DGM. In other applications, usually of complex and unstable natures, such as those in economics, psychology, and epidemiology, choosing one reliable model can be difficult.
Instead of relying on just one model, model averaging averages results over multiple plausible models based on the observed data. In BMA, the "plausibility" of the model is described by the posterior model probability (PMP), which is determined using the fundamental Bayesian principles—the Bayes theorem—and applied universally to all data analyses.
BMA can be used to account for model uncertainty when estimating model parameters and predicting new observations to avoid overly optimistic conclusions. It is particularly useful in applications with several plausible models, where there is no one definitive reason to choose a particular model over the others. But even if choosing a single model is the end goal, you can find BMA beneficial. It provides a principled way to identify important models and predictors within the considered classes of models. Its framework allows you to learn about interrelations between different predictors in terms of their tendency to appear in a model together, separately, or independently. It can be used to evaluate the sensitivity of the final results to various assumptions about the importance of different models and predictors. And it provides optimal predictions in the log-score sense.
In a regression setting, model uncertainty amounts to the uncertainty of which predictors should be included in a model. We can use bmaregress to account for selection of predictors in a linear regression. It explores the model space either exhaustively with the enumeration option, whenever feasible, or by using the Markov chain Monte Carlo (MCMC) model composition (MC3) algorithm with the sampling option. It reports various summaries of the visited models and included predictors and of posterior distributions of model parameters. It allows you to specify groups of predictors to be included or excluded together from a model and those that are included in all models. It provides various prior distributions for models in the mprior() option and for the g parameter, which controls shrinkage of regression coefficients toward zero, in the gprior() option. It also supports factor variables and time-series operators and provides several ways for handling interactions during estimation by using the heredity() option.
There are many supported postestimation features, which also include some of the standard Bayesian postestimation features.
Command | Description |
---|---|
bmacoefsample |
posterior samples of regression coefficients |
bmagraph pmp |
model-probability plots |
bmagraph msize |
model-size distribution plots |
bmagraph varmap |
variable-inclusion maps |
bmagraph coefdensity |
coefficient posterior density plots |
bmastats models |
posterior model and variable-inclusion summaries |
bmastats msize |
model-size summary |
bmastats pip |
posterior inclusion probabilities (PIPs) for predictors |
bmastats jointness |
jointness measures for predictors |
bmastats lps |
log predictive-score (LPS) |
bmapredict |
BMA predictions |
bayesgraph |
Bayesian graphical summaries and convergence diagnostics |
bayesstats summary |
Bayesian summary statistics for model parameters and their functions |
bayesstats ess |
Bayesian effective sample sizes and related statistics |
bayesstats ppvalues |
Bayesian predictive p-values |
bayespredict |
Bayesian predictions |
-> Model enumeration
-> Credible intervals
-> Influential models
-> Important predictors
-> Model-size distribution
-> Posterior distributions of coefficients
-> BMA prediction
-> Informative model priors
-> Predictive performance using LPS
-> Cleanup
Below, we provide a tour of some bma features using a toy dataset. This dataset contains \(n=200\) observations, \(p=10\) orthogonal predictors, and outcome y generated as
\({\mathtt y} = {0.5} + {1.2}\times{\mathtt x2} + {5}\times{\mathtt x10} + \epsilon\)
where \(\epsilon\sim N(0,1)\) is a standard normal error term.
. webuse bmaintro (Simulated data for BMA example) . summarize
Variable | Obs Mean Std. dev. Min Max | |
y | 200 .9944997 4.925052 -13.332 13.06587 | |
x1 | 200 -.0187403 .9908957 -3.217909 2.606215 | |
x2 | 200 -.0159491 1.098724 -2.999594 2.566395 | |
x3 | 200 .080607 1.007036 -3.016552 3.020441 | |
x4 | 200 .0324701 1.004683 -2.410378 2.391406 | |
x5 | 200 -.0821737 .9866885 -2.543018 2.133524 | |
x6 | 200 .0232265 1.006167 -2.567606 3.840835 | |
x7 | 200 -.1121034 .9450883 -3.213471 1.885638 | |
x8 | 200 -.0668903 .9713769 -2.871328 2.808912 | |
x9 | 200 -.1629013 .9550258 -2.647837 2.472586 | |
x10 | 200 .083902 .8905923 -2.660675 2.275681 |
We use bmaregress to fit a BMA linear regression of y on x1 through x10.
. bmaregress y x1-x10 Enumerating models ... Computing model probabilities ... Bayesian model averaging No. of obs = 200 Linear regression No. of predictors = 10 Model enumeration Groups = 10 Always = 0 Priors: No. of models = 1,024 Models: Beta-binomial(1, 1) For CPMP >= .9 = 9 Cons.: Noninformative Mean model size = 2.479 Coef.: Zellner's g g: Benchmark, g = 200 Shrinkage, g/(1+g) = 0.9950 sigma2: Noninformative Mean sigma2 = 1.272
y | Mean Std. dev. Group PIP | |
x2 | 1.198105 .0733478 2 1 | |
x10 | 5.08343 .0900953 10 1 | |
x3 | -.0352493 .0773309 3 .21123 | |
x9 | .004321 .0265725 9 .051516 | |
x1 | .0033937 .0232163 1 .046909 | |
x4 | -.0020407 .0188504 4 .039267 | |
x5 | .0005972 .0152443 5 .033015 | |
x9 | -.0005639 .0153214 8 .032742 | |
x7 | -8.23e-06 .015497 7 .032386 | |
x6 | -.0003648 .0143983 6 .032361 | |
Always | ||
_cons | .5907923 .0804774 0 1 | |
With 10 predictors and the default fixed (benchmark) g-prior, bmaregress uses model enumeration and explores all \(2^{10}=1,024\) possible models. The default model prior is beta-binomial with shape parameters of 1, which is uniform over the model size. A priori, our BMA model assumed little shrinkage of regression coefficients toward zero because \(g/(1+g) = 0.9950\) is close to 1.
bmaregress identified x2 and x10 as highly important predictors—their posterior inclusion probabilities (PIPs) are essentially 1. The posterior mean estimates of their regression coefficients are, respectively, 1.2 (rounded) and 5.1 and are very close to the true values of 1.2 and 5. All other predictors have much lower PIPs and coefficient estimates close to zero. Our BMA findings are consistent with the true DGM.
Let's store our current estimation results for later use. As with other Bayesian commands, we must save the simulation results first before we can use estimates store to store the estimation results.
. bmaregress, saving(bmareg) note: file bmareg.dta saved. . estimates store bmareg
bmaregress does not report credible intervals (CrIs) by default, because it requires a potentially time-consuming simulation of the posterior sample of model parameters. But we can compute them after estimation.
We first use bmacoefsample to generate an MCMC sample from the posterior distribution of model parameters, which include regression coefficients. We then use the existing bayesstats summary command to compute posterior summaries of the generated MCMC sample.
. bmacoefsample, rseed(18) Simulation (10000): ....5000....10000 done . bayesstats summary Posterior summary statistics MCMC sample size = 10,000
Equal-tailed | Mean Std. dev. MCSE Median [95% cred. interval] | |
y | ||
x1 | .0031927 .0234241 .000234 0 0 .0605767 | |
x2 | 1.197746 .0726358 .000735 1.197211 1.053622 1.341076 | |
x3 | -.0361581 .0780037 .00078 0 -.2604694 0 | |
x4 | -.0021015 .018556 .000186 0 -.0306376 0 | |
x5 | .0004701 .0147757 .000148 0 0 0 | |
x6 | -.0003859 .0140439 .000142 0 0 0 | |
x7 | -.0003311 .0166303 .000166 0 0 0 | |
x8 | -.0005519 .0145717 .00015 0 0 0 | |
x9 | .0046535 .0273899 .000274 0 0 .096085 | |
x10 | 5.08357 .0907759 .000927 5.083466 4.90354 5.262716 | |
_cons | .5901334 .0811697 .000801 .5905113 .4302853 .7505722 | |
sigma2 | 1.272579 .1300217 .0013 1.262612 1.043772 1.555978 | |
g | 200 0 0 200 200 200 | |
The 95% CrI for the coefficient of x2 is [1.05, 1.34], and the one for x10 is [4.9, 5.26], which is consistent with our DGM.
The BMA coefficient estimates are averaged over 1,024 models. It is important to investigate which models are influential. We can use bmastats models to explore the PMPs.
. bmastats models Model summary Number of models: Visited = 1,024 Reported = 5
Analytical PMP Model size | ||
Rank | ||
1 | .6292 2 | |
2 | .1444 3 | |
3 | .0258 3 | |
4 | .0246 3 | |
5 | .01996 3 | |
Rank Rank Rank Rank Rank | |||||
1 2 3 4 5 | |||||
x2 | x x x x x | ||||
x10 | x x x x x | ||||
x3 | x | ||||
x9 | x | ||||
x1 | x | ||||
x4 | x | ||||
Not surprisingly, the model that contains both x2 and x10 has the highest PMP of about 63%. In fact, the top two models correspond to a cumulative PMP of about 77%:
. bmastats models, cumulative(0.75) Computing model probabilities ... Model summary Number of models: Visited = 1,024 Reported = 2
Analytical CPMP Model size | ||
Rank | ||
1 | .6292 2 | |
2 | .7736 3 | |
Rank Rank | |||||
1 2 | |||||
x2 | x x | ||||
x10 | x x | ||||
x3 | x | ||||
We can use bmagraph pmp to plot the cumulative PMP.
. bmagraph pmp, cumulative
The command plots the first 100 models by default, but you can specify the top() option if you would like to see more models.
We can explore the importance of predictors visually by using bmagraph varmap to produce a variable-inclusion map.
. bmagraph varmap Computing model probabilities ...
x2 and x10 are included in all top 100 models, ranked by PMP. Their coefficients are positive in all models.
We can explore the complexity of our BMA model by using bmastats msize and bmagraph msize to explore the prior and posterior model-size distributions.
. bmastats msize Model-size summary Number of models = 1,024 Model size: Minimum = 0 Maximum = 10
Mean Median | |||||
Prior | |||||
Analytical | 5.0000 5 | ||||
Posterior | |||||
Analytical | 2.4794 2 | ||||
. bmagraph msize
The prior model-size distribution is uniform over the model size. The posterior model-size distribution is skewed to the left with the mode of about 2. The prior mean model size is 5 and the posterior one is 2.48. So, based on the observed data, BMA tends to favor smaller models with roughly two predictors, on average, compared with our a priori assumption.
We can use bmagraph coefdensity to plot posterior distributions of regression coefficients.
. bmagraph coefdensity {x2} {x3}, combine
These distributions are mixtures of a point mass at zero corresponding to the predictor's probability of noninclusion and a continuous density conditional on the inclusion. For the coefficient of x2, the probability of noninclusion is negligible, so its posterior distribution is essentially a continuous, fairly symmetric density centered at 1.2. For the coefficient of x3, in addition to the conditional continuous density, there is a red vertical line at zero corresponding to the posterior noninclusion probability of roughly 1-.2 = 0.8.
bmapredict produces various BMA predictions. For instance, let's compute the posterior predictive means.
. bmapredict pmean, mean note: computing analytical posterior predictive means.
We can also compute predictive CrIs to estimate uncertainty about our predictions. (This is not available with many traditional model-selection methods.) Note that these predictive CrIs also incorporate model uncertainty. To compute predictive CrIs, we must save the posterior MCMC model-parameter sample first.
. bmacoefsample, saving(bmacoef) note: saving existing MCMC simulation results without resampling; specify option simulate to force resampling in this case. note: file bmacoef.dta saved. . bmapredict cri_l cri_u, cri rseed(18) note: computing credible intervals using simulation. Computing predictions ...
We summarize the predicted results:
. summarize y pmean cri*
Variable | Obs Mean Std. dev. Min Max | ||||
y | 200 .9944997 4.925052 -13.332 13.06587 | ||||
pmean | 200 .9944997 4.783067 -13.37242 12.31697 | ||||
cri_l | 200 -1.24788 4.787499 -15.66658 10.03054 | ||||
cri_u | 200 3.227426 4.779761 -11.06823 14.58301 |
The reported averages over observations for predictive means and lower and upper 95% predictive CrI bounds look reasonable relative to the observed outcome y.
One of the strengths of BMA is the ability to incorporate prior information about models and predictors. This allows us to investigate the sensitivity of the results to various assumptions about the importance of models and predictors. Suppose that we have reliable information that all predictors except x2 and x10 are less likely to be related to the outcome. We can specify, for example, a binomial model prior with low prior inclusion probabilities for these predictors.
To compare the out-of-sample performance of the BMA models later, we randomly split our sample into two parts and fit the BMA model using the first subsample. We also save the results from this more informative BMA model.
. splitsample, generate(sample) nsplit(2) rseed(18) . bmaregress y x1-x10 if sample == 1, mprior(binomial x2 x10 0.5 x1 x3-x9 0.05) saving(bmareg_inf) Enumerating models ... Computing model probabilities ... Bayesian model averaging No. of obs = 100 Linear regression No. of predictors = 10 Model enumeration Groups = 10 Always = 0 Priors: No. of models = 1,024 Models: Binomial, IP varies For CPMP >= .9 = 1 Cons.: Noninformative Mean model size = 2.072 Coef.: Zellner's g g: Benchmark, g = 100 Shrinkage, g/(1+g) = 0.9901 sigma2: Noninformative Mean sigma2 = 1.268
y | Mean Std. dev. Group PIP | |||
x2 | 1.168763 .1031096 2 1 | |||
x10 | 4.920726 .124615 10 1 | |||
x1 | .0019244 .0204242 1 .013449 | |||
x5 | -.0018262 .0210557 5 .011973 | |||
x3 | -.0017381 .0205733 3 .011557 | |||
x4 | -.0015444 .0193858 4 .010709 | |||
Always | ||||
_cons | .5637673 .113255 0 1 | |||
The effect of this model prior is that the PIP of all unimportant predictors is now less than 2%.
Note that when we select one model, we are essentially fitting a BMA model with a very strong prior that all selected predictors must be included with a probability of 1. For instance, we can force bmaregress to include all variables in the model:
. bmaregress y (x1-x10, always) if sample == 1, saving(bmareg_all) Enumerating models ... Computing model probabilities ... Bayesian model averaging No. of obs = 100 Linear regression No. of predictors = 10 Model enumeration Groups = 0 Always = 10 Priors: No. of models = 1 Models: Beta-binomial(1, 1) For CPMP >= .9 = 1 Cons.: Noninformative Mean model size = 10.000 Coef.: Zellner's g g: Benchmark, g = 100 Shrinkage, g/(1+g) = 0.9901 sigma2: Noninformative Mean sigma2 = 1.192
y | Mean Std. dev. Group PIP | ||||
Always | x1 | .1294521 .105395 0 1 | |||
x2 | 1.166679 .1129949 0 1 | ||||
x3 | -.1433074 .1271903 0 1 | ||||
x4 | -.1032189 .1223152 0 1 | ||||
x5 | -.0819008 .1261309 0 1 | ||||
x6 | .0696633 .1057512 0 1 | ||||
x7 | .0222949 .1215404 0 1 | ||||
x8 | -.0252311 .1124352 0 1 | ||||
x9 | .0587412 .1166921 0 1 | ||||
x10 | 4.949992 .1276795 0 1 | ||||
_cons | .6006153 .1127032 0 1 | ||||
For comparison of the considered BMA models, we refit our default BMA model using the first subsample and store the estimation results.
. qui bmaregress y x1-x10 if sample == 1, saving(bmareg, replace) . estimates store bmareg
We can compare the out-of-sample predictive performance of our BMA models by using bmastats lps to compute the log predictive-score (LPS) for out-of-sample observations.
. bmastats lps bmareg bmareg_inf bmareg_all if sample == 2, compact Log predictive-score (LPS) Number of observations = 100
LPS | Mean Minimum Maximum | ||||
bmareg | 1.562967 1.039682 6.778834 | ||||
bmareg_inf | 1.562238 1.037576 6.883794 | ||||
bmareg_all | 1.576231 1.032793 6.084414 | ||||
The more informative bmareg_inf model has a slightly smaller mean LPS, but the LPS summaries for all models are very similar. See [BMA] bmastats lps for how to compare the BMA model performance by using cross-validation.
We generated several datasets during our analysis. We no longer need them, so we erase them at the end. But you may decide to keep them, especially if they correspond to a potentially time-consuming final analysis.
. erase bmareg.dta . erase bmacoef.dta . erase bmareg_inf.dta . erase bmareg_all.dta
See Motivating examples in [BMA] Intro for more detailed analyses of the above toy dataset and comparison of BMA results to linear regression, stepwise selection, and lasso, as well as for the use of BMA with \(n\lt=p\). In [BMA] bmaregress, for a full tour of bma, see Getting started examples; for model choice and inference, see BMA analysis of cross-country economic growth data; and for predictions, see BMA predictive performance for the USA crime rate data and Remarks and examples of [BMA] bmapredict.
Read more about Bayesian model averaging in the Stata Bayesian Model Averaging Reference Manual.
View all the new features in Stata 18 and, in particular, New in Bayesian analysis.