In the spotlight: Bayesian quantile regression

Everyone loves a good model for the conditional expectation. But in many applications, it can make more sense to model the median or some other quantile, conditional on covariates. Through quantile regression, as implemented in the qreg command, we can use a linear combination of our regressors to explain a quantile of the dependent variable. The math of it all has been around for a long time in the frequentist literature and is actually not too different from ordinary least squares.

The Bayesian equivalent of this, however, is a lot more recent and due to Yu and Moyeed (2001), who proposed using an asymmetric Laplace distribution for the likelihood function. Bayesian quantile regression uses this likelihood along with priors for our model parameters and returns full posterior distributions of those parameters.

As with so many other commands, Bayesian quantile regression in StataNow is now just a matter of typing a bayes prefix before your qreg command. For more complex tasks, like adding random intercepts or modeling multiple quantiles simultaneously, we can use our older friend bayesmh. I will take you through examples of both commands below.

And if you need some convincing to take these Bayesian methods for a spin with your data, keep in mind that the posterior standard deviations of Bayesian quantile estimation are model based and can be more efficient than the bootstrap-based and kernel-based standard errors of frequentist quantile models.

Let's see it work

The mathscores dataset contains the math scores of students in the third year (math3) and fifth year (math5) from different schools (school) in Inner London (Mortimore et al. 1988).

. webuse mathscores

. describe

Contains data from https://www.stata-press.com/data/r18/mathscores.dta
 Observations:           887                  
    Variables:             3                  9 May 2022 23:31



Variable      Storage   Display    Value                                        
    name         type    format    label      Variable label                    

school          float   %9.0g                 School ID                         
math3           float   %9.0g                 Year 3 math score                 
math5           float   %9.0g                 Year 5 math score                 

Sorted by:

. summarize math3 math5


    Variable          Obs        Mean    Std. dev.       Min        Max
   
       math3          887    .3607666    5.783583        -21         11
       math5          887    30.56595    6.666168          5         40

Without specifying any options, the bayes: qreg command runs a regression where we explain the median fifth-year math score with the third-year math scores:

. bayes, rseed(123): qreg math5 math3
  
Burn-in ...
Simulation ...

Model summary

 
Likelihood:                                                                    
  math5 ~ asymlaplaceq(xb_math5_q50,{sigma},.5)                               
                                                                              
Priors:
  {math5_q50:math3 _cons} ~ normal(0,10000)                                (1)
                  {sigma} ~ igamma(0.01,0.01)                                 
 
(1) Parameters are elements of the linear form xb_math5_q50.                  
                                                                              
Bayesian quantile regression                     MCMC iterations  =     12,500
Random-walk Metropolis–Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
Quantile = .5                                    Number of obs    =        887
                                                 Acceptance rate  =      .3559
                                                 Efficiency:  min =       .123
                                                              avg =      .1553
Log marginal-likelihood = -2815.4069                          max =      .2133



                                                              Equal-tailed    
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]
   
math5_q50                                                                     
       math3    .5908475   .0343072   .000978    .590633    .523432   .6547106
       _cons    31.29088   .1988516   .005522   31.28314   30.89681   31.67256
   
       sigma     2.15015    .073256   .001586   2.148896    2.01285   2.299333

Note: Adaptation tolerance is not met in at least one of the blocks.

Notice we used a seed, so you can reproduce this example. Taking the mean of the posterior distribution as our estimate of the effect of math3 on math5 (the first row of the Mean column in the output), we say that each additional point of math3 is associated with a median fifth-year math score that is 0.59 points higher. Our output also provides the 95% credible interval [0.523, 0.655], which are the 0.025 and 0.975 quantiles of the posterior distribution of the slope of math3.

To model the 25th percentile (or 0.25 quantile) of math5 instead, we can simply type

. bayes, rseed(123): qreg math5 math3, quantile(0.25)

Burn-in ...
Simulation ...

Model summary

 
Likelihood:                                                                   
  math5 ~ asymlaplaceq(xb_math5_q25,{sigma},.25)                              
                                                                              
Priors:
  {math5_q25:math3 _cons} ~ normal(0,10000)                                (1)
                  {sigma} ~ igamma(0.01,0.01)                                 
 
(1) Parameters are elements of the linear form xb_math5_q25.                  
                                                                              
Bayesian quantile regression                     MCMC iterations  =     12,500
Random-walk Metropolis–Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
Quantile = .25                                   Number of obs    =        887
                                                 Acceptance rate  =      .3472
                                                 Efficiency:  min =      .1115
                                                              avg =      .1487
Log marginal-likelihood = -2958.6068                          max =      .1992



                                                              Equal-tailed    
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]
   
math5_q25                                                                     
       math3    .8170189   .0314721   .000942   .8182132   .7535031   .8759191
       _cons    26.90917   .1937895   .005269   26.91391   26.50618   27.26515
   
       sigma    1.896149    .064245   .001439   1.894073   1.776397   2.026846

Here the mean of the posterior distribution of math3 coefficient is 0.82, which indicates that each additional point of math3 is associated with a 0.82 increase in the 25th percentile of math5 scores.

Of course, all existing Bayesian postestimation commands are available after bayes: qreg. For example, using the bayestest interval command, we can compute the posterior probability for the math3 coefficient in this last regression to be between 0.523 and 0.655 (that is, to be within the credible interval from our model for the median):

. bayestest interval {math5_q25:math3}, lower(0.523) upper(0.655)


Interval tests     MCMC sample size =    10,000

       prob1 : 0.523 < {math5_q25:math3} < .655



                    Mean    Std. dev.      MCSE
   
       prob1           0     0.00000          0

This results in a posterior probability of 0, suggesting that the effect of third-year scores on fifth-year scores is different for the 25th and 50th percentiles of math5. In other words, it seems that previous math performance is more predictive of the fifth-year math performance of students who score poorly on the fifth-year exam than it is of the performance of students who score near the median on the fifth-year exam.

Random-intercept quantile regression

In the two regressions above, we ignored that students are grouped in schools and that students from the same school may be more alike than those from different schools. We should consider how this can affect the intercepts, which are our predictions for the median and 25th percentile of math5, respectively, when math3 is equal to 0. If we suspect the intercept may be different across schools, we could fit a random-intercept model with the bayesmh command.

Here we fit a Bayesian random-intercept model for the median, specifying asymlaplaceq({scale},0.5) in the likelihood() option with a random scale parameter {scale} and a quantile value of 0.5. We use fairly uninformative priors for the coefficients (normal(0,10000)), scale (igamma(0.01,0.01)), and random-intercept variance (igamma(0.01,0.01)). We use blocking to improve the Markov chain Monte Carlo sampling and specify a random seed for reproducibility.

. bayesmh math5 math3 U[school],
     likelihood(asymlaplaceq({scale},0.5))
     prior({math5:}, normal(0,10000)) block({math5:})
     prior({scale},  igamma(0.01,0.01)) block({scale})
     prior({var_U},  igamma(0.01,0.01)) rseed(123)

Burn-in 2500 aaaaaaaaa1000aaaaaaaaa2000aaaaa done
Simulation 10000 .........1000.........2000.........3000.........4000.........50
> 00.........6000.........7000.........8000.........9000.........10000 done

Model summary

 
Likelihood:                                                                   
  math5 ~ asymlaplaceq(xb_math5,{scale},0.5)                                  
                                                                              
Priors:                                                                       
  {math5:math3 _cons} ~ normal(0,10000)                                    (1)
          {U[school]} ~ normal(0,{var_U})                                  (1)
              {scale} ~ igamma(0.01,0.01)                                     
                                                                              
Hyperprior:                                                                  
  {var_U} ~ igamma(0.01,0.01)                                                 
 
(1) Parameters are elements of the linear form xb_math5.                      
                                                                              
Bayesian asymm. Laplace (quantile) regression    MCMC iterations  =     12,500
Random-walk Metropolis–Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =        887
                                                 Acceptance rate  =      .3405
                                                 Efficiency:  min =     .01407
                                                              avg =     .08571
Log marginal-likelihood                                       max =      .2184



                                                              Equal-tailed    
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]
   
math5                                                                         
       math3     .582191   .0292987   .001198    .581954   .5278437   .6415957
       _cons    31.10884   .3211908    .02708   31.11139   30.48056   31.71486
   
       scale    1.978836   .0666567   .001426   1.979629   1.851416   2.115848
       var_U    3.703947   1.093265   .048578   3.541629   1.960388   6.281583

The posterior mean estimate for the math3 coefficient is 0.58, and the variance of the random intercept is 3.70.

Random-intercept, simultaneous quantile regression

Finally, as in our example with bayes: qreg, we may wonder whether the effect of math3 is different for different quantiles. We can use bayesmh to fit two Bayesian quantile regressions simultaneously using a two-equation specification.

For example, let's consider the 0.25 and 0.75 quantiles. We label the two regression equations math5_q25 and math5_q75 and again use uninformative priors for coefficients and variance parameters. We also use a common scale parameter, {scale}, because the outcome variable is the same in the two equations. The burn-in length is increased to 5,000 to improve sampling adaptation.

. bayesmh (math5_q25:math5 math3 U1[school], likelihood(asymlaplaceq({scale},0.25)))
     (math5_q75:math5 math3 U2[school], likelihood(asymlaplaceq({scale},0.75))),
     prior({math5_q25:}, normal(0,10000)) block({math5_q25:})
     prior({math5_q75:}, normal(0,10000)) block({math5_q75:})
     prior({scale}, igamma(0.01,0.01))
     prior({var_U1}, igamma(0.01,0.01)) block({var_U1})
     prior({var_U2}, igamma(0.01,0.01)) block({var_U2})
     burnin(5000) rseed(123)

Burn-in 5000 aaaaaaaaa1000aaaaaaaaa2000aaaaaaaaa3000aaaaaaaaa4000aaaaaaaaa5000 d
> one
Simulation 10000 .........1000.........2000.........3000.........4000.........50
> 00.........6000.........7000.........8000.........9000.........10000 done

Model summary

 
Likelihood:                                                                   
  math5 ~ asymlaplaceq(math5 math3 U1[school],{scale},0.25)                   
  math5 ~ asymlaplaceq(math5 math3 U2[school],{scale},0.75)                   
                                                                              
Prior:                                                                        
  {scale} ~ igamma(0.01,0.01)                                                 
                                                                              
Hyperpriors:                                                                  
  {math5_q25:math3 U1 _cons} ~ normal(0,10000)                                
  {math5_q75:math3 U2 _cons} ~ normal(0,10000)                                
             {var_U1 var_U2} ~ igamma(0.01,0.01)                              
                {U1[school]} ~ normal(0,{var_U1})                             
                {U2[school]} ~ normal(0,{var_U2})                             
 
                                                                              
Bayesian asymm. Laplace (quantile) regression    MCMC iterations  =     15,000
Random-walk Metropolis–Hastings sampling         Burn-in          =      5,000
                                                 MCMC sample size =     10,000
                                                 Number of obs    =        887
                                                 Acceptance rate  =      .3253
                                                 Efficiency:  min =    .009337
                                                              avg =     .05927
Log marginal-likelihood                                       max =      .1921



                                                              Equal-tailed    
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]
   
math5_q25                                                                     
       math3    .7803433   .0330395   .001954   .7792282   .7153192   .8437527
          U1           1          0         0          1          1          1
       _cons    27.14928   .4539244   .046977   27.16099   26.16523   28.01503
   
math5_q75                                                                     
       math3    .4011098   .0288801   .001227   .4015173   .3423591   .4565079
          U2           1          0         0          1          1          1
       _cons    34.20564   .2810127   .021137   34.21637   33.65177   34.74148
   
       scale    1.558671   .0384754   .000878    1.55878   1.484724   1.635941
      var_U1    7.916603    2.19551   .084288    7.64116   4.614094   13.10948
      var_U2    2.327081    .757776   .036133   2.226657   1.140627   4.148844

The posterior mean estimates of the math3 coefficient for the 0.25 and 0.75 quantiles are 0.78 and 0.40, respectively, which are above and below the earlier median estimate of 0.58. It appears that the regression slope decreases as the quantile value increases. We can therefore conclude that math3 is a stronger predictor of the 25th percentile of math5 scores than of the 75th percentile of math5 scores.

Parting words

StataNow added the asymmetric Laplace distribution to the list of likelihoods available for Bayesian models, which allows for Bayesian quantile estimation. To learn more, see [BAYES] bayes:qreg and [BAYES] bayesmh.

References

Mortimore, P., P. Sammons, L. Stoll, D. Lewis, and R. Ecob. 1988. School Matters: The Junior Years. Wells, Somerset, UK: Open Books.

Yu, K., and R. A. Moyeed. 2001. Bayesian quantile regression. Statistics & Probability Letters 54: 437–447.

— Alvaro Fuentes Higuera
Senior Econometrician

«Back to main page