Home  /  Stata News  /  Vol 39 No 3  /  In the spotlight: Bayesian quantile regression
The Stata News

«Back to main page

In the spotlight: Bayesian quantile regression

Everyone loves a good model for the conditional expectation. But in many applications, it can make more sense to model the median or some other quantile, conditional on covariates. Through quantile regression, as implemented in the qreg command, we can use a linear combination of our regressors to explain a quantile of the dependent variable. The math of it all has been around for a long time in the frequentist literature and is actually not too different from ordinary least squares.

The Bayesian equivalent of this, however, is a lot more recent and due to Yu and Moyeed (2001), who proposed using an asymmetric Laplace distribution for the likelihood function. Bayesian quantile regression uses this likelihood along with priors for our model parameters and returns full posterior distributions of those parameters.

As with so many other commands, Bayesian quantile regression in StataNow is now just a matter of typing a bayes prefix before your qreg command. For more complex tasks, like adding random intercepts or modeling multiple quantiles simultaneously, we can use our older friend bayesmh. I will take you through examples of both commands below.

And if you need some convincing to take these Bayesian methods for a spin with your data, keep in mind that the posterior standard deviations of Bayesian quantile estimation are model based and can be more efficient than the bootstrap-based and kernel-based standard errors of frequentist quantile models.

Let's see it work

The mathscores dataset contains the math scores of students in the third year (math3) and fifth year (math5) from different schools (school) in Inner London (Mortimore et al. 1988).

. webuse mathscores

. describe

Contains data from https://www.stata-press.com/data/r18/mathscores.dta
 Observations:           887                  
    Variables:             3                  9 May 2022 23:31

Variable Storage Display Value
name type format label Variable label
school float %9.0g School ID
math3 float %9.0g Year 3 math score
math5 float %9.0g Year 5 math score
Sorted by: . summarize math3 math5
Variable Obs Mean Std. dev. Min Max
math3 887 .3607666 5.783583 -21 11
math5 887 30.56595 6.666168 5 40

Without specifying any options, the bayes: qreg command runs a regression where we explain the median fifth-year math score with the third-year math scores:

. bayes, rseed(123): qreg math5 math3
  
Burn-in ...
Simulation ...

Model summary
Likelihood:
math5 ~ asymlaplaceq(xb_math5_q50,{sigma},.5)
Priors:
{math5_q50:math3 _cons} ~ normal(0,10000) (1)
{sigma} ~ igamma(0.01,0.01)
(1) Parameters are elements of the linear form xb_math5_q50.
Bayesian quantile regression MCMC iterations = 12,500
Random-walk Metropolis–Hastings sampling Burn-in = 2,500
MCMC sample size = 10,000
Quantile = .5 Number of obs = 887
Acceptance rate = .3559
Efficiency: min = .123
avg = .1553
Log marginal-likelihood = -2815.4069 max = .2133
Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]
math5_q50
math3 .5908475 .0343072 .000978 .590633 .523432 .6547106
_cons 31.29088 .1988516 .005522 31.28314 30.89681 31.67256
sigma 2.15015 .073256 .001586 2.148896 2.01285 2.299333
Note: Adaptation tolerance is not met in at least one of the blocks.

Notice we used a seed, so you can reproduce this example. Taking the mean of the posterior distribution as our estimate of the effect of math3 on math5 (the first row of the Mean column in the output), we say that each additional point of math3 is associated with a median fifth-year math score that is 0.59 points higher. Our output also provides the 95% credible interval [0.523, 0.655], which are the 0.025 and 0.975 quantiles of the posterior distribution of the slope of math3.

To model the 25th percentile (or 0.25 quantile) of math5 instead, we can simply type

. bayes, rseed(123): qreg math5 math3, quantile(0.25)

Burn-in ...
Simulation ...

Model summary
Likelihood:
math5 ~ asymlaplaceq(xb_math5_q25,{sigma},.25)
Priors:
{math5_q25:math3 _cons} ~ normal(0,10000) (1)
{sigma} ~ igamma(0.01,0.01)
(1) Parameters are elements of the linear form xb_math5_q25.
Bayesian quantile regression MCMC iterations = 12,500
Random-walk Metropolis–Hastings sampling Burn-in = 2,500
MCMC sample size = 10,000
Quantile = .25 Number of obs = 887
Acceptance rate = .3472
Efficiency: min = .1115
avg = .1487
Log marginal-likelihood = -2958.6068 max = .1992
Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]
math5_q25
math3 .8170189 .0314721 .000942 .8182132 .7535031 .8759191
_cons 26.90917 .1937895 .005269 26.91391 26.50618 27.26515
sigma 1.896149 .064245 .001439 1.894073 1.776397 2.026846

Here the mean of the posterior distribution of math3 coefficient is 0.82, which indicates that each additional point of math3 is associated with a 0.82 increase in the 25th percentile of math5 scores.

Of course, all existing Bayesian postestimation commands are available after bayes: qreg. For example, using the bayestest interval command, we can compute the posterior probability for the math3 coefficient in this last regression to be between 0.523 and 0.655 (that is, to be within the credible interval from our model for the median):

. bayestest interval {math5_q25:math3}, lower(0.523) upper(0.655)


Interval tests     MCMC sample size =    10,000

       prob1 : 0.523 < {math5_q25:math3} < .655

Mean Std. dev. MCSE
prob1 0 0.00000 0

This results in a posterior probability of 0, suggesting that the effect of third-year scores on fifth-year scores is different for the 25th and 50th percentiles of math5. In other words, it seems that previous math performance is more predictive of the fifth-year math performance of students who score poorly on the fifth-year exam than it is of the performance of students who score near the median on the fifth-year exam.

Random-intercept quantile regression

In the two regressions above, we ignored that students are grouped in schools and that students from the same school may be more alike than those from different schools. We should consider how this can affect the intercepts, which are our predictions for the median and 25th percentile of math5, respectively, when math3 is equal to 0. If we suspect the intercept may be different across schools, we could fit a random-intercept model with the bayesmh command.

Here we fit a Bayesian random-intercept model for the median, specifying asymlaplaceq({scale},0.5) in the likelihood() option with a random scale parameter {scale} and a quantile value of 0.5. We use fairly uninformative priors for the coefficients (normal(0,10000)), scale (igamma(0.01,0.01)), and random-intercept variance (igamma(0.01,0.01)). We use blocking to improve the Markov chain Monte Carlo sampling and specify a random seed for reproducibility.

. bayesmh math5 math3 U[school],
     likelihood(asymlaplaceq({scale},0.5))
     prior({math5:}, normal(0,10000)) block({math5:})
     prior({scale},  igamma(0.01,0.01)) block({scale})
     prior({var_U},  igamma(0.01,0.01)) rseed(123)

Burn-in 2500 aaaaaaaaa1000aaaaaaaaa2000aaaaa done
Simulation 10000 .........1000.........2000.........3000.........4000.........50
> 00.........6000.........7000.........8000.........9000.........10000 done

Model summary
Likelihood:
math5 ~ asymlaplaceq(xb_math5,{scale},0.5)
Priors:
{math5:math3 _cons} ~ normal(0,10000) (1)
{U[school]} ~ normal(0,{var_U}) (1)
{scale} ~ igamma(0.01,0.01)
Hyperprior:
{var_U} ~ igamma(0.01,0.01)
(1) Parameters are elements of the linear form xb_math5.
Bayesian asymm. Laplace (quantile) regression MCMC iterations = 12,500
Random-walk Metropolis–Hastings sampling Burn-in = 2,500
MCMC sample size = 10,000
Number of obs = 887
Acceptance rate = .3405
Efficiency: min = .01407
avg = .08571
Log marginal-likelihood max = .2184
Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]
math5
math3 .582191 .0292987 .001198 .581954 .5278437 .6415957
_cons 31.10884 .3211908 .02708 31.11139 30.48056 31.71486
scale 1.978836 .0666567 .001426 1.979629 1.851416 2.115848
var_U 3.703947 1.093265 .048578 3.541629 1.960388 6.281583

The posterior mean estimate for the math3 coefficient is 0.58, and the variance of the random intercept is 3.70.

Random-intercept, simultaneous quantile regression

Finally, as in our example with bayes: qreg, we may wonder whether the effect of math3 is different for different quantiles. We can use bayesmh to fit two Bayesian quantile regressions simultaneously using a two-equation specification.

For example, let's consider the 0.25 and 0.75 quantiles. We label the two regression equations math5_q25 and math5_q75 and again use uninformative priors for coefficients and variance parameters. We also use a common scale parameter, {scale}, because the outcome variable is the same in the two equations. The burn-in length is increased to 5,000 to improve sampling adaptation.

. bayesmh (math5_q25:math5 math3 U1[school], likelihood(asymlaplaceq({scale},0.25)))
     (math5_q75:math5 math3 U2[school], likelihood(asymlaplaceq({scale},0.75))),
     prior({math5_q25:}, normal(0,10000)) block({math5_q25:})
     prior({math5_q75:}, normal(0,10000)) block({math5_q75:})
     prior({scale}, igamma(0.01,0.01))
     prior({var_U1}, igamma(0.01,0.01)) block({var_U1})
     prior({var_U2}, igamma(0.01,0.01)) block({var_U2})
     burnin(5000) rseed(123)

Burn-in 5000 aaaaaaaaa1000aaaaaaaaa2000aaaaaaaaa3000aaaaaaaaa4000aaaaaaaaa5000 d
> one
Simulation 10000 .........1000.........2000.........3000.........4000.........50
> 00.........6000.........7000.........8000.........9000.........10000 done

Model summary
Likelihood:
math5 ~ asymlaplaceq(math5 math3 U1[school],{scale},0.25)
math5 ~ asymlaplaceq(math5 math3 U2[school],{scale},0.75)
Prior:
{scale} ~ igamma(0.01,0.01)
Hyperpriors:
{math5_q25:math3 U1 _cons} ~ normal(0,10000)
{math5_q75:math3 U2 _cons} ~ normal(0,10000)
{var_U1 var_U2} ~ igamma(0.01,0.01)
{U1[school]} ~ normal(0,{var_U1})
{U2[school]} ~ normal(0,{var_U2})
Bayesian asymm. Laplace (quantile) regression MCMC iterations = 15,000
Random-walk Metropolis–Hastings sampling Burn-in = 5,000
MCMC sample size = 10,000
Number of obs = 887
Acceptance rate = .3253
Efficiency: min = .009337
avg = .05927
Log marginal-likelihood max = .1921
Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]
math5_q25
math3 .7803433 .0330395 .001954 .7792282 .7153192 .8437527
U1 1 0 0 1 1 1
_cons 27.14928 .4539244 .046977 27.16099 26.16523 28.01503
math5_q75
math3 .4011098 .0288801 .001227 .4015173 .3423591 .4565079
U2 1 0 0 1 1 1
_cons 34.20564 .2810127 .021137 34.21637 33.65177 34.74148
scale 1.558671 .0384754 .000878 1.55878 1.484724 1.635941
var_U1 7.916603 2.19551 .084288 7.64116 4.614094 13.10948
var_U2 2.327081 .757776 .036133 2.226657 1.140627 4.148844

The posterior mean estimates of the math3 coefficient for the 0.25 and 0.75 quantiles are 0.78 and 0.40, respectively, which are above and below the earlier median estimate of 0.58. It appears that the regression slope decreases as the quantile value increases. We can therefore conclude that math3 is a stronger predictor of the 25th percentile of math5 scores than of the 75th percentile of math5 scores.

Parting words

StataNow added the asymmetric Laplace distribution to the list of likelihoods available for Bayesian models, which allows for Bayesian quantile estimation. To learn more, see [BAYES] bayes:qreg and [BAYES] bayesmh.

References

Mortimore, P., P. Sammons, L. Stoll, D. Lewis, and R. Ecob. 1988. School Matters: The Junior Years. Wells, Somerset, UK: Open Books.

Yu, K., and R. A. Moyeed. 2001. Bayesian quantile regression. Statistics & Probability Letters 54: 437–447.

— Alvaro Fuentes Higuera
Senior Econometrician

«Back to main page