In the spotlight: In a world of uncertainty, BMA is a good friend to have
Let's use Bayesian model averaging (BMA) to investigate how mental health indicators influence time spent playing video games using an excerpt from this Kaggle dataset posted by Aditi Awasthi.
Usually, we choose one model and estimate parameters and other quantities of interest from the observed data conditional on this chosen model. For our example, we would like to fit a linear regression of hours spent playing video games on 42 predictors, including items from the Generalized Anxiety Disorder Inventory (GAD), the Satisfaction With Life Scale (SWL), the Social Phobia Inventory (SPIN), a narcissism measure, age, gender, employment status, and highest-obtained education degree.
. regress hoursplay gad1-age i.(gender employment degree)
Source | SS df MS | Number of obs = 8,980F(41, 8938) = 4.56 | |
Model | 67655.8921 41 1650.14371 | Prob > F = 0.0000 | |
Residual | 3235534.77 8,938 361.997625 | R-squared = 0.0205 | Adj R-squared = 0.0160 |
Total | 3303190.66 8,979 367.87957 | Root MSE = 19.026 |
hoursplay | Coefficient Std. err. t P>|t| [95% conf. interval] | |
gad1 | -.2484525 .3176057 -0.78 0.434 -.8710325 .3741276 | |
gad2 | -.6537547 .3615262 -1.81 0.071 -1.362429 .0549195 | |
gad3 | .0722776 .3137106 0.23 0.818 -.5426672 .6872224 | |
gad4 | .4940691 .2959787 1.67 0.095 -.0861171 1.074255 | |
gad5 | .0878802 .270851 0.32 0.746 -.4430499 .6188103 | |
gad6 | .7852404 .2516806 3.12 0.002 .2918887 1.278592 | |
gad7 | .2805871 .2930849 0.96 0.338 -.2939265 .8551006 | |
swl1 | .0066118 .1825303 0.04 0.971 -.3511895 .364413 | |
swl2 | -.4580919 .161555 -2.84 0.005 -.7747769 -.141407 | |
swl3 | .0443038 .181529 0.24 0.807 -.3115347 .4001423 | |
swl4 | -.0539451 .1581935 -0.34 0.733 -.3640406 .2561504 | |
swl5 | .0128688 .1295983 0.10 0.921 -.2411736 .2669112 | |
spin1 | -.2453695 .2566992 -0.96 0.339 -.7485588 .2578198 | |
spin2 | .0493649 .2074694 0.24 0.812 -.3573226 .4560525 | |
spin3 | .0831727 .2622274 0.32 0.751 -.4308532 .5971986 | |
spin4 | -.2718726 .2324322 -1.17 0.242 -.7274931 .1837478 | |
spin5 | -.2370667 .2308252 -1.03 0.304 -.689537 .2154037 | |
spin6 | -.1637211 .2398618 -0.68 0.495 -.6339052 .306463 | |
spin7 | .1243836 .1986672 0.63 0.531 -.2650498 .513817 | |
spin8 | .4553621 .1998924 2.28 0.023 .0635271 .8471971 | |
spin9 | -.1461319 .2036353 -0.72 0.473 -.5453039 .2530401 | |
spin10 | .0705562 .2569985 0.27 0.784 -.4332197 .5743321 | |
spin11 | .2240838 .1717869 1.30 0.192 -.112658 .5608256 | |
spin12 | .2116736 .2405859 0.88 0.379 -.25993 .6832772 | |
spin13 | .1469812 .2474796 0.59 0.553 -.3381357 .632098 | |
spin14 | .098584 .2486039 0.40 0.692 -.3887366 .5859046 | |
spin15 | -.2587051 .2133122 -1.21 0.225 -.676846 .1594357 | |
spin16 | .0126282 .2652423 0.05 0.962 -.5073075 .5325638 | |
spin17 | .0864269 .2182376 0.40 0.692 -.3413688 .5142226 | |
narcissism | .4333946 .1921391 2.26 0.024 .0567578 .8100314 | |
age | -.2757585 .0797467 -3.46 0.001 -.4320804 -.1194366 | |
gender | ||
Male | -1.780342 .9040933 -1.97 0.049 -3.552572 -.0081118 | |
Other | 3.058195 3.270204 0.94 0.350 -3.352155 9.468544 | |
employment | ||
NA | 3.149185 4.010776 0.79 0.432 -4.712856 11.01123 | |
Student at.. | -1.2569 .5909958 -2.13 0.033 -2.415387 -.0984127 | |
Student at.. | -.4458813 .8405001 -0.53 0.596 -2.093454 1.201692 | |
Unemployed.. | 4.132545 .8034745 5.14 0.000 2.557551 5.70754 | |
degree | ||
High scho..) | .9275215 .5741915 1.62 0.106 -.1980256 2.053069 | |
Master (o..) | .6352425 1.124561 0.56 0.572 -1.569156 2.839641 | |
None | 1.078527 .8941615 1.21 0.228 -.6742349 2.831289 | |
Ph.D., Ps..) | -.7402326 2.282285 -0.32 0.746 -5.214034 3.733569 | |
_cons | 35.53717 2.573734 13.81 0.000 30.49206 40.58228 | |
Hmm. Do we need all 42 of these predictors? Probably not, but we’re not sure exactly which ones to include. In other words, which model should we select? In the case of linear regression, methods like stepwise regression, Wald tests, or even Bayes factors can be used to select one from among candidate models, but why should we have to choose?
With BMA, we account for model uncertainty in our analysis by averaging results over all plausible models based on the observed data. We do this by considering a pool of candidate models and estimating each model’s posterior probability according to Bayes’s Theorem. In the context of linear regression with p predictors, there are 2p candidate models. Then we estimate parameters by averaging parameter estimates across models with weights according to their posterior probabilities.
We can perform this analysis using Stata 18’s new bmaregress command. As with other estimation commands, we list our dependent variable followed by our list of independent variables.
. bmaregress hoursplay gad1-age i.(gender employment degree) Burn-in ... Simulation ... Computing model probabilities ... Bayesian model averaging No. of obs = 8,980 Linear regression No. of predictors = 41 MC3 sampling Groups = 41 Always = 0 No. of models = 52 For CPMP >= .9 = 6 Priors: Mean model size = 3.916 Models: Beta-binomial(1, 1) Burn-in = 2,500 Cons.: Noninformative MCMC sample size = 10,000 Coef.: Zellner's g Acceptance rate = 0.0261 g: Benchmark, g = 8,980 Shrinkage, g/(1+g) = 0.9999 sigma2: Noninformative Mean sigma2 = 362.484 Sampling correlation = 0.9980
hoursplay | Mean Std. dev. Group PIP | |
employment | ||
Unemployed / between jobs | 5.060935 .7228275 37 1 | |
age | -.2956931 .0635898 31 .99628 | |
gad6 | .8708031 .3563267 6 .90604 | |
swl2 | -.4278654 .2250447 9 .83169 | |
spin8 | .0416567 .1368465 20 .094544 | |
employment | ||
Student at college / university | -.0384546 .219673 35 .034229 | |
narcissism | .0076027 .0626469 30 .017336 | |
Always | ||
_cons | 35.17864 1.94818 0 1 | |
The top of the output tells us that 52 models were explored, the top 6 of which account for 90+% of the cumulative posterior model probability. The mean model size was 3.9, meaning each model included about 4 predictors on average.
The estimation table includes estimates of posterior means and standard deviations of coefficients for each predictor. Across the models explored here, we infer that on average unemployed participants are expected to spend 5.06 more hours playing video games than other employment groups.
We also see estimated posterior inclusion probabilities (PIPs). The indicator for unemployment has a PIP of 1, meaning it was included in all explored models. The rest of the predictors are listed in order of their PIPs.
We can get more information about our top five models using postestimation command bmastats models.
. bmastats models Computing model probabilities ... Model summary Number of models: Visited = 52 Reported = 5
Analytical PMP Frequency PMP Model size | ||
Rank | ||
1 | .6256 .5683 4 | |
2 | .1292 .1339 3 | |
3 | .0663 .0916 3 | |
4 | .04341 .0425 5 | |
5 | .02787 .0257 4 | |
Rank Rank Rank Rank Rank | ||
1 2 3 4 5 | ||
gad6 | x x x x | |
swl2 | x x x | |
age | x x x x x | |
employment | ||
Unemployed / between jobs | x x x x x | |
spin8 | x x | |
The highest-ranked model has a posterior model probability of 0.63 and includes four predictors. We can see which four predictors in the table below: gad6, swl2, age, and the indicator for unemployment. The next highest-ranked model has a posterior model probability of 0.13 and includes three predictors: gad6, age, and unemployment.
We can visualize variable inclusions across all models using the postestimation command bmagraph varmap.
. bmagraph varmap Computing model probabilities ...
In addition to being pretty, this variable-inclusion map is also useful in providing an intuitive look at BMA. Each row is a predictor: blue signifying a positive regression coefficient, red signifying a negative coefficient, and gray signifying that it was not included in that model.
The first block extends to a cumulative posterior probability of 0.63, indicating our top-ranked model. Again, we can see that it includes four predictors. Unemployment and item 6 of the GAD inventory, which asks about frequency of “Becoming easily annoyed or irritable”, increase the expected time spent playing video games, while age and item 2 of the SWL scale, which rates agreement with “The conditions of my life are excellent”, decrease expected time. In other words, for young, unemployed people who are not satisfied with the conditions of their life and get easily annoyed, video games are a popular escape.
The next block represents the second-ranked model, and so on. In general, we can see that these four predictors are included in most models and that the eighth item of SPIN, which asks about agreement with “I avoid going to parties”, is also sometimes included with a negative coefficient. This makes sense. Parties probably aren’t very fun if you are annoyed with everyone there. We can explore this relationship further using bmastats jointness.
. bmastats jointness gad6 spin8 Computing model probabilities ... Variables: gad6 spin8
Jointness | ||
Doppelhofer–Weeks | -1.064686 | |
Ley–Steel type 1 | .0810813 | |
Ley–Steel type 2 | .0882356 | |
Yule's Q | -.4871701 | |
These results suggest strong disjointedness, meaning that if we already have one of these predictors in our model, we really don’t need the other.
Even with this short demonstration, you can already see how much more information we get about our data than if we had just fit one model. And, of course, we can also use BMA to make better predictions than traditional methods by accounting for model uncertainty using bmapredict.
. bmapredict predicted_hours, mean note: computing analytical posterior predictive means.
Here we performed a simple BMA analysis to demonstrate the method and a few of Stata's tools. For better predictive power, we could consider adding additional covariates, interaction terms, nonlinear terms, or even splines to the model space that BMA explores.
For further information about using BMA for prediction, model selection, and inference, including discussion on prior selection, variable grouping, and many more examples, take a tour of the Stata Bayesian Model Averaging Reference Manual.
— Meghan Cain
Assistant Director, Educational Services