In the spotlight: Select predictors like a Bayesian–with probability

Bayesian variable selection is actually a misnomer in that variables are not “selected” per se. Rather, these methods account for variable inclusion when estimating their regression coefficients. If a variable has a very low probability of being included in the model, its regression coefficient will necessarily be very low as well. By simultaneously estimating variable inclusion and regression coefficients, Bayesian variable-selection methods offer interpretable regression coefficients, computational efficiency, and predictive power.

Bayesian variable selection for linear regression is now part of Stata’s Bayesian suite with the new command bayesselect. Bayesian variable selection uses special priors for regression coefficients to "select" variables. bayesselect implements two main classes of such priors with two options each:

Global–local shrinkage priors

Horseshoe priors
Bayesian lasso priors

Spike-and-slab priors

Mixture of normal distributions
Mixture of Laplace distributions

Global–local shrinkage priors

Global–local shrinkage priors estimate shrinkage coefficients for each predictor (local) and the model as a whole (global). The HalfCauchy(0,1) prior with location 0 and scale 1 is used for the global shrinkage coefficient. For local shrinkage, there are two options: Horseshoe priors use HalfCauchy(0,1) priors with option hshoe, and Bayesian lasso priors use Rayleigh(1) priors with option blasso. Alternative scale parameters for each may be specified with options hshoe(scale) and blasso(scale).

Here we use the horseshoe prior to perform Bayesian variable selection for the linear regression of y on x1-x10.

. webuse bmaintro

. bayesselect y x*, hshoe

Burn-in ...
Simulation ...

Model summary



Likelihood: 
  y ~ normal(xb_y,{sigma2})

Priors: 
  {y:x1 ... x10} ~ glshrinkage(1,{tau},{lambdas})                          (1)
       {y:_cons} ~ normal(0,10000)                                         (1)
        {sigma2} ~ jeffreys

Hyperprior: 
  {tau lambdas} ~ halfcauchy(0,1)
                                                                              



(1) Parameters are elements of the linear form xb_y.

Bayesian variable selection                      MCMC iterations  =     12,500
Metropolis–Hastings and Gibbs sampling           Burn-in          =      2,500
                                                 MCMC sample size =     10,000
Global–local shrinkage coefficient prior:        Number of obs    =        200
  Horseshoe(1)                                   Acceptance rate  =      .8663
                                                 Efficiency:  min =      .1246
                                                              avg =      .6779
Log marginal-likelihood = -299.54424                          max =          1
 
                                                                              


                                                    Equal-tailed     Inclusion
           y        Mean   Std. dev.     MCSE   [95% cred. interval]     coef.
   
         x10    5.118017    .085567   .0008711   4.948845   5.283815      1.00
          x2    1.187323   .0712223   .0007286   1.047547   1.328874      0.95
          x3   -.1207944   .0849954    .002408  -.2946459   .0145886      0.49
          x9    .0464023   .0656988   .0013186  -.0579827   .1961329      0.34
          x1    .0345529   .0597626   .0011747  -.0646373   .1768837      0.31
          x4   -.0237189   .0558671   .0007187  -.1533891   .0809342      0.30
          x8   -.0121395   .0540153   .0005674   -.134576   .0938028      0.29
          x7    .0031087   .0545986   .0005503  -.1104932    .121574      0.28
          x6   -.0055535   .0497625   .0004949   -.118772   .0954979      0.27
          x5    .0111128   .0518425    .000606  -.0914474   .1293249      0.27




                                                              Equal-tailed    
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]
   
y                                                                             
       _cons    .6041679   .0774303   .000774   .6044451    .452349   .7562248
   
      sigma2     1.16222    .120819   .002566   1.155618    .952355   1.425683
         tau    .1943006   .1665629   .008659   .1482105   .0273504   .6280757

Two predictors, x10 and x2, have inclusion coefficients above 0.5 and are thus important predictors in this model.

Spike-and-slab priors

Spike-and-slab priors select predictor coefficients to get priors centered around 0 with less (spike) or more (slab) variance. Priors with less variance around 0 pull those regression coefficients more strongly toward 0. The Beta(1,1) prior is used for the probability of selection into spike or slab. Alternative shape parameters may be specified with option betaprior(a b). For regression coefficients, there are two options: Normal mixture priors use Normal(0,0.01) for the spike and Normal(0,1) for the slab with option ssnormal; and Laplace mixture priors use Laplace(0,0.01) for the spike and Laplace(0,1) for the slab with option sslaplace. Alternative scale parameters for each may be specified with options ssnormal(sd1 sd2) and sslaplace(scale1 scale2).

Because of the selection into spike or slab, this class of priors will necessarily estimate a wider range of regression coefficients. These priors also directly model variable inclusion probability.

We again perform Bayesian variable selection for the linear regression of y on x1-x10, this time using a mixture of normal distributions as spike-and-slab priors.

. bayesselect y x*, ssnormal

Burn-in ...
Simulation ...

Model summary



Likelihood:
  y ~ normal(xb_y,{sigma2})

Priors:
  {y:x1 ... x10} ~ mixnormal0(1,.01,1,{gammas})                            (1)
       {y:_cons} ~ normal(0,10000)                                         (1)
        {sigma2} ~ jeffreys

Hyperpriors:
  {gammas} ~ bernoulli({theta})
   {theta} ~ beta(1,1)
                                                                              



(1) Parameters are elements of the linear form xb_y.

Bayesian variable selection                      MCMC iterations  =     12,500
Metropolis–Hastings and Gibbs sampling           Burn-in          =      2,500
                                                 MCMC sample size =     10,000
Spike-and-slab coefficient prior:                Number of obs    =        200
  Normal mixture: N(0,.01) and N(0,1)            Acceptance rate  =      .8532
  Beta(1,1) for {theta}                          Efficiency:  min =     .02978
                                                              avg =      .5603
Log marginal-likelihood = -313.30188                          max =          1
                                                                              


                                                    Equal-tailed     Inclusion
           y        Mean   Std. dev.     MCSE   [95% cred. interval]     coef.
   
         x10    1.184789   .0717791   .0007178   1.043796   1.326506      1.00
          x2    5.097081    .086673   .0008667   4.928525   5.266116      1.00
          x3    -.072102   .1043113   .0060447  -.3100787   .0200108      0.39
          x9    .0045657   .0373229   .0005026  -.1151221   .0499449      0.16
          x1    .0048683   .0414482   .0008529  -.0305942   .1463594      0.14
          x4     .014639   .0431819   .0011083   -.019592   .1610175      0.13
          x8    .0100405   .0321387   .0003293  -.0563094    .085392      0.12

Note: 3 coefficients with inclusion values less than .1 not shown.



                                                              Equal-tailed    
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]
   
y                                                                             
       _cons    .6210694   .0792017   .000792   .6208332   .4660962   .7751767
   
      sigma2    1.169667   .1204309   .002662   1.163899   .9554763   1.430844
         tau    .3441381   .1591277   .004623   .3279924    .085891   .6914042

The results based on the mixture of normal priors for regression coefficients are consistent with our earlier findings that x10 and x2 are important predictors.

To view the full details of Bayesian variable selection, all the bayesselect options, and fully worked examples, see [BAYES] bayesselect.

— Meghan Cain
Assistant Director, Educational Services

«Back to main page