Home  /  Stata News  /  Vol 39 No 5  /  In the spotlight: Select predictors like a Bayesian–with probability
The Stata News

«Back to main page


In the spotlight: Select predictors like a Bayesian–with probability

Bayesian variable selection is actually a misnomer in that variables are not “selected” per se. Rather, these methods account for variable inclusion when estimating their regression coefficients. If a variable has a very low probability of being included in the model, its regression coefficient will necessarily be very low as well. By simultaneously estimating variable inclusion and regression coefficients, Bayesian variable-selection methods offer interpretable regression coefficients, computational efficiency, and predictive power.

Bayesian variable selection for linear regression is now part of Stata’s Bayesian suite with the new command bayesselect. Bayesian variable selection uses special priors for regression coefficients to "select" variables. bayesselect implements two main classes of such priors with two options each:

  • Global–local shrinkage priors
    • Horseshoe priors
    • Bayesian lasso priors
  • Spike-and-slab priors
    • Mixture of normal distributions
    • Mixture of Laplace distributions

Global–local shrinkage priors

Global–local shrinkage priors estimate shrinkage coefficients for each predictor (local) and the model as a whole (global). The HalfCauchy(0,1) prior with location 0 and scale 1 is used for the global shrinkage coefficient. For local shrinkage, there are two options: Horseshoe priors use HalfCauchy(0,1) priors with option hshoe, and Bayesian lasso priors use Rayleigh(1) priors with option blasso. Alternative scale parameters for each may be specified with options hshoe(scale) and blasso(scale).

Here we use the horseshoe prior to perform Bayesian variable selection for the linear regression of y on x1-x10.

. webuse bmaintro

. bayesselect y x*, hshoe

Burn-in ...
Simulation ...

Model summary
Likelihood: y ~ normal(xb_y,{sigma2}) Priors: {y:x1 ... x10} ~ glshrinkage(1,{tau},{lambdas}) (1) {y:_cons} ~ normal(0,10000) (1) {sigma2} ~ jeffreys Hyperprior: {tau lambdas} ~ halfcauchy(0,1)
(1) Parameters are elements of the linear form xb_y. Bayesian variable selection MCMC iterations = 12,500 Metropolis–Hastings and Gibbs sampling Burn-in = 2,500 MCMC sample size = 10,000 Global–local shrinkage coefficient prior: Number of obs = 200 Horseshoe(1) Acceptance rate = .8663 Efficiency: min = .1246 avg = .6779 Log marginal-likelihood = -299.54424 max = 1
Equal-tailed Inclusion
y Mean Std. dev. MCSE [95% cred. interval] coef.
x10 5.118017 .085567 .0008711 4.948845 5.283815 1.00
x2 1.187323 .0712223 .0007286 1.047547 1.328874 0.95
x3 -.1207944 .0849954 .002408 -.2946459 .0145886 0.49
x9 .0464023 .0656988 .0013186 -.0579827 .1961329 0.34
x1 .0345529 .0597626 .0011747 -.0646373 .1768837 0.31
x4 -.0237189 .0558671 .0007187 -.1533891 .0809342 0.30
x8 -.0121395 .0540153 .0005674 -.134576 .0938028 0.29
x7 .0031087 .0545986 .0005503 -.1104932 .121574 0.28
x6 -.0055535 .0497625 .0004949 -.118772 .0954979 0.27
x5 .0111128 .0518425 .000606 -.0914474 .1293249 0.27
Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]
y
_cons .6041679 .0774303 .000774 .6044451 .452349 .7562248
sigma2 1.16222 .120819 .002566 1.155618 .952355 1.425683
tau .1943006 .1665629 .008659 .1482105 .0273504 .6280757

Two predictors, x10 and x2, have inclusion coefficients above 0.5 and are thus important predictors in this model.


Spike-and-slab priors

Spike-and-slab priors select predictor coefficients to get priors centered around 0 with less (spike) or more (slab) variance. Priors with less variance around 0 pull those regression coefficients more strongly toward 0. The Beta(1,1) prior is used for the probability of selection into spike or slab. Alternative shape parameters may be specified with option betaprior(a b). For regression coefficients, there are two options: Normal mixture priors use Normal(0,0.01) for the spike and Normal(0,1) for the slab with option ssnormal; and Laplace mixture priors use Laplace(0,0.01) for the spike and Laplace(0,1) for the slab with option sslaplace. Alternative scale parameters for each may be specified with options ssnormal(sd1 sd2) and sslaplace(scale1 scale2).

Because of the selection into spike or slab, this class of priors will necessarily estimate a wider range of regression coefficients. These priors also directly model variable inclusion probability.

We again perform Bayesian variable selection for the linear regression of y on x1-x10, this time using a mixture of normal distributions as spike-and-slab priors.

. bayesselect y x*, ssnormal

Burn-in ...
Simulation ...

Model summary
Likelihood: y ~ normal(xb_y,{sigma2}) Priors: {y:x1 ... x10} ~ mixnormal0(1,.01,1,{gammas}) (1) {y:_cons} ~ normal(0,10000) (1) {sigma2} ~ jeffreys Hyperpriors: {gammas} ~ bernoulli({theta}) {theta} ~ beta(1,1)
(1) Parameters are elements of the linear form xb_y. Bayesian variable selection MCMC iterations = 12,500 Metropolis–Hastings and Gibbs sampling Burn-in = 2,500 MCMC sample size = 10,000 Spike-and-slab coefficient prior: Number of obs = 200 Normal mixture: N(0,.01) and N(0,1) Acceptance rate = .8532 Beta(1,1) for {theta} Efficiency: min = .02978 avg = .5603 Log marginal-likelihood = -313.30188 max = 1
Equal-tailed Inclusion
y Mean Std. dev. MCSE [95% cred. interval] coef.
x10 1.184789 .0717791 .0007178 1.043796 1.326506 1.00
x2 5.097081 .086673 .0008667 4.928525 5.266116 1.00
x3 -.072102 .1043113 .0060447 -.3100787 .0200108 0.39
x9 .0045657 .0373229 .0005026 -.1151221 .0499449 0.16
x1 .0048683 .0414482 .0008529 -.0305942 .1463594 0.14
x4 .014639 .0431819 .0011083 -.019592 .1610175 0.13
x8 .0100405 .0321387 .0003293 -.0563094 .085392 0.12
Note: 3 coefficients with inclusion values less than .1 not shown.
Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]
y
_cons .6210694 .0792017 .000792 .6208332 .4660962 .7751767
sigma2 1.169667 .1204309 .002662 1.163899 .9554763 1.430844
tau .3441381 .1591277 .004623 .3279924 .085891 .6914042

The results based on the mixture of normal priors for regression coefficients are consistent with our earlier findings that x10 and x2 are important predictors.

To view the full details of Bayesian variable selection, all the bayesselect options, and fully worked examples, see [BAYES] bayesselect.

— Meghan Cain
Assistant Director, Educational Services

«Back to main page