In the spotlight: Bayesian quantile regression
Everyone loves a good model for the conditional expectation. But in many applications, it can make more sense to model the median or some other quantile, conditional on covariates. Through quantile regression, as implemented in the qreg command, we can use a linear combination of our regressors to explain a quantile of the dependent variable. The math of it all has been around for a long time in the frequentist literature and is actually not too different from ordinary least squares.
The Bayesian equivalent of this, however, is a lot more recent and due to Yu and Moyeed (2001), who proposed using an asymmetric Laplace distribution for the likelihood function. Bayesian quantile regression uses this likelihood along with priors for our model parameters and returns full posterior distributions of those parameters.
As with so many other commands, Bayesian quantile regression in StataNow is now just a matter of typing a bayes prefix before your qreg command. For more complex tasks, like adding random intercepts or modeling multiple quantiles simultaneously, we can use our older friend bayesmh. I will take you through examples of both commands below.
And if you need some convincing to take these Bayesian methods for a spin with your data, keep in mind that the posterior standard deviations of Bayesian quantile estimation are model based and can be more efficient than the bootstrap-based and kernel-based standard errors of frequentist quantile models.
Let's see it work
The mathscores dataset contains the math scores of students in the third year (math3) and fifth year (math5) from different schools (school) in Inner London (Mortimore et al. 1988).
. webuse mathscores . describe Contains data from https://www.stata-press.com/data/r18/mathscores.dta Observations: 887 Variables: 3 9 May 2022 23:31
Variable Storage Display Value |
name type format label Variable label |
school float %9.0g School ID |
math3 float %9.0g Year 3 math score |
math5 float %9.0g Year 5 math score |
Variable | Obs Mean Std. dev. Min Max | |
math3 | 887 .3607666 5.783583 -21 11 | |
math5 | 887 30.56595 6.666168 5 40 |
Without specifying any options, the bayes: qreg command runs a regression where we explain the median fifth-year math score with the third-year math scores:
. bayes, rseed(123): qreg math5 math3 Burn-in ... Simulation ... Model summary
Likelihood: |
math5 ~ asymlaplaceq(xb_math5_q50,{sigma},.5) |
Priors: |
{math5_q50:math3 _cons} ~ normal(0,10000) (1) |
{sigma} ~ igamma(0.01,0.01) |
(1) Parameters are elements of the linear form xb_math5_q50. |
Bayesian quantile regression MCMC iterations = 12,500 |
Random-walk Metropolis–Hastings sampling Burn-in = 2,500 |
MCMC sample size = 10,000 |
Quantile = .5 Number of obs = 887 |
Acceptance rate = .3559 |
Efficiency: min = .123 |
avg = .1553 |
Log marginal-likelihood = -2815.4069 max = .2133 |
Equal-tailed | ||
Mean Std. dev. MCSE Median [95% cred. interval] | ||
math5_q50 | ||
math3 | .5908475 .0343072 .000978 .590633 .523432 .6547106 | |
_cons | 31.29088 .1988516 .005522 31.28314 30.89681 31.67256 | |
sigma | 2.15015 .073256 .001586 2.148896 2.01285 2.299333 | |
Notice we used a seed, so you can reproduce this example. Taking the mean of the posterior distribution as our estimate of the effect of math3 on math5 (the first row of the Mean column in the output), we say that each additional point of math3 is associated with a median fifth-year math score that is 0.59 points higher. Our output also provides the 95% credible interval [0.523, 0.655], which are the 0.025 and 0.975 quantiles of the posterior distribution of the slope of math3.
To model the 25th percentile (or 0.25 quantile) of math5 instead, we can simply type
. bayes, rseed(123): qreg math5 math3, quantile(0.25) Burn-in ... Simulation ... Model summary
Likelihood: |
math5 ~ asymlaplaceq(xb_math5_q25,{sigma},.25) |
Priors: |
{math5_q25:math3 _cons} ~ normal(0,10000) (1) |
{sigma} ~ igamma(0.01,0.01) |
(1) Parameters are elements of the linear form xb_math5_q25. |
Bayesian quantile regression MCMC iterations = 12,500 |
Random-walk Metropolis–Hastings sampling Burn-in = 2,500 |
MCMC sample size = 10,000 |
Quantile = .25 Number of obs = 887 |
Acceptance rate = .3472 |
Efficiency: min = .1115 |
avg = .1487 |
Log marginal-likelihood = -2958.6068 max = .1992 |
Equal-tailed | ||
Mean Std. dev. MCSE Median [95% cred. interval] | ||
math5_q25 | ||
math3 | .8170189 .0314721 .000942 .8182132 .7535031 .8759191 | |
_cons | 26.90917 .1937895 .005269 26.91391 26.50618 27.26515 | |
sigma | 1.896149 .064245 .001439 1.894073 1.776397 2.026846 | |
Here the mean of the posterior distribution of math3 coefficient is 0.82, which indicates that each additional point of math3 is associated with a 0.82 increase in the 25th percentile of math5 scores.
Of course, all existing Bayesian postestimation commands are available after bayes: qreg. For example, using the bayestest interval command, we can compute the posterior probability for the math3 coefficient in this last regression to be between 0.523 and 0.655 (that is, to be within the credible interval from our model for the median):
. bayestest interval {math5_q25:math3}, lower(0.523) upper(0.655) Interval tests MCMC sample size = 10,000 prob1 : 0.523 < {math5_q25:math3} < .655
Mean Std. dev. MCSE | ||
prob1 | 0 0.00000 0 | |
This results in a posterior probability of 0, suggesting that the effect of third-year scores on fifth-year scores is different for the 25th and 50th percentiles of math5. In other words, it seems that previous math performance is more predictive of the fifth-year math performance of students who score poorly on the fifth-year exam than it is of the performance of students who score near the median on the fifth-year exam.
Random-intercept quantile regression
In the two regressions above, we ignored that students are grouped in schools and that students from the same school may be more alike than those from different schools. We should consider how this can affect the intercepts, which are our predictions for the median and 25th percentile of math5, respectively, when math3 is equal to 0. If we suspect the intercept may be different across schools, we could fit a random-intercept model with the bayesmh command.
Here we fit a Bayesian random-intercept model for the median, specifying asymlaplaceq({scale},0.5) in the likelihood() option with a random scale parameter {scale} and a quantile value of 0.5. We use fairly uninformative priors for the coefficients (normal(0,10000)), scale (igamma(0.01,0.01)), and random-intercept variance (igamma(0.01,0.01)). We use blocking to improve the Markov chain Monte Carlo sampling and specify a random seed for reproducibility.
. bayesmh math5 math3 U[school], likelihood(asymlaplaceq({scale},0.5)) prior({math5:}, normal(0,10000)) block({math5:}) prior({scale}, igamma(0.01,0.01)) block({scale}) prior({var_U}, igamma(0.01,0.01)) rseed(123) Burn-in 2500 aaaaaaaaa1000aaaaaaaaa2000aaaaa done Simulation 10000 .........1000.........2000.........3000.........4000.........50 > 00.........6000.........7000.........8000.........9000.........10000 done Model summary
Likelihood: |
math5 ~ asymlaplaceq(xb_math5,{scale},0.5) |
Priors: |
{math5:math3 _cons} ~ normal(0,10000) (1) |
{U[school]} ~ normal(0,{var_U}) (1) |
{scale} ~ igamma(0.01,0.01) |
Hyperprior: |
{var_U} ~ igamma(0.01,0.01) |
(1) Parameters are elements of the linear form xb_math5. |
Bayesian asymm. Laplace (quantile) regression MCMC iterations = 12,500 |
Random-walk Metropolis–Hastings sampling Burn-in = 2,500 |
MCMC sample size = 10,000 |
Number of obs = 887 |
Acceptance rate = .3405 |
Efficiency: min = .01407 |
avg = .08571 |
Log marginal-likelihood max = .2184 |
Equal-tailed | ||
Mean Std. dev. MCSE Median [95% cred. interval] | ||
math5 | ||
math3 | .582191 .0292987 .001198 .581954 .5278437 .6415957 | |
_cons | 31.10884 .3211908 .02708 31.11139 30.48056 31.71486 | |
scale | 1.978836 .0666567 .001426 1.979629 1.851416 2.115848 | |
var_U | 3.703947 1.093265 .048578 3.541629 1.960388 6.281583 | |
The posterior mean estimate for the math3 coefficient is 0.58, and the variance of the random intercept is 3.70.
Random-intercept, simultaneous quantile regression
Finally, as in our example with bayes: qreg, we may wonder whether the effect of math3 is different for different quantiles. We can use bayesmh to fit two Bayesian quantile regressions simultaneously using a two-equation specification.
For example, let's consider the 0.25 and 0.75 quantiles. We label the two regression equations math5_q25 and math5_q75 and again use uninformative priors for coefficients and variance parameters. We also use a common scale parameter, {scale}, because the outcome variable is the same in the two equations. The burn-in length is increased to 5,000 to improve sampling adaptation.
. bayesmh (math5_q25:math5 math3 U1[school], likelihood(asymlaplaceq({scale},0.25))) (math5_q75:math5 math3 U2[school], likelihood(asymlaplaceq({scale},0.75))), prior({math5_q25:}, normal(0,10000)) block({math5_q25:}) prior({math5_q75:}, normal(0,10000)) block({math5_q75:}) prior({scale}, igamma(0.01,0.01)) prior({var_U1}, igamma(0.01,0.01)) block({var_U1}) prior({var_U2}, igamma(0.01,0.01)) block({var_U2}) burnin(5000) rseed(123) Burn-in 5000 aaaaaaaaa1000aaaaaaaaa2000aaaaaaaaa3000aaaaaaaaa4000aaaaaaaaa5000 d > one Simulation 10000 .........1000.........2000.........3000.........4000.........50 > 00.........6000.........7000.........8000.........9000.........10000 done Model summary
Likelihood: |
math5 ~ asymlaplaceq(math5 math3 U1[school],{scale},0.25) |
math5 ~ asymlaplaceq(math5 math3 U2[school],{scale},0.75) |
Prior: |
{scale} ~ igamma(0.01,0.01) |
Hyperpriors: |
{math5_q25:math3 U1 _cons} ~ normal(0,10000) |
{math5_q75:math3 U2 _cons} ~ normal(0,10000) |
{var_U1 var_U2} ~ igamma(0.01,0.01) |
{U1[school]} ~ normal(0,{var_U1}) |
{U2[school]} ~ normal(0,{var_U2}) |
Bayesian asymm. Laplace (quantile) regression MCMC iterations = 15,000 |
Random-walk Metropolis–Hastings sampling Burn-in = 5,000 |
MCMC sample size = 10,000 |
Number of obs = 887 |
Acceptance rate = .3253 |
Efficiency: min = .009337 |
avg = .05927 |
Log marginal-likelihood max = .1921 |
Equal-tailed | ||
Mean Std. dev. MCSE Median [95% cred. interval] | ||
math5_q25 | ||
math3 | .7803433 .0330395 .001954 .7792282 .7153192 .8437527 | |
U1 | 1 0 0 1 1 1 | |
_cons | 27.14928 .4539244 .046977 27.16099 26.16523 28.01503 | |
math5_q75 | ||
math3 | .4011098 .0288801 .001227 .4015173 .3423591 .4565079 | |
U2 | 1 0 0 1 1 1 | |
_cons | 34.20564 .2810127 .021137 34.21637 33.65177 34.74148 | |
scale | 1.558671 .0384754 .000878 1.55878 1.484724 1.635941 | |
var_U1 | 7.916603 2.19551 .084288 7.64116 4.614094 13.10948 | |
var_U2 | 2.327081 .757776 .036133 2.226657 1.140627 4.148844 | |
The posterior mean estimates of the math3 coefficient for the 0.25 and 0.75 quantiles are 0.78 and 0.40, respectively, which are above and below the earlier median estimate of 0.58. It appears that the regression slope decreases as the quantile value increases. We can therefore conclude that math3 is a stronger predictor of the 25th percentile of math5 scores than of the 75th percentile of math5 scores.
Parting words
StataNow added the asymmetric Laplace distribution to the list of likelihoods available for Bayesian models, which allows for Bayesian quantile estimation. To learn more, see [BAYES] bayes:qreg and [BAYES] bayesmh.
References
Mortimore, P., P. Sammons, L. Stoll, D. Lewis, and R. Ecob. 1988. School Matters: The Junior Years. Wells, Somerset, UK: Open Books.
Yu, K., and R. A. Moyeed. 2001. Bayesian quantile regression. Statistics & Probability Letters 54: 437–447.
— Alvaro Fuentes Higuera
Senior Econometrician