Home  /  Products  /  Stata 16  /  Bayesian analysis: Gelman-Rubin convergence diagnostic

This page announced the new features in Stata 16. Please see our Stata 18 page for the new features in Stata 18.

Bayesian analysis: Gelman-Rubin convergence diagnostic

Highlights

  • Compute using multiple chains
  • Maximum diagnostic automatically reported by bayes: and bayesmh
  • Compute diagnostics for each parameter
  • Compute diagnostics for functions of parameters
See all features

Establishing convergence of Markov chain Monte Carlo (MCMC) is one of the most important steps of Bayesian analysis. Multiple chains are often used to check MCMC convergence. The Gelman–Rubin convergence diagnostic provides a numerical convergence summary based on multiple chains. Stata's new command bayesstats grubin computes this diagnostic for all model parameters. Also, when you simulate multiple chains using bayes: or bayesmh, these commands automatically report the maximum of the convergence diagnostics across all model parameters.

After using bayes, nchains(): or bayesmh, nchains() to simulate multiple chains, look at the maximum Gelman–Rubin diagnostic reported by the commands in the header. When it suggests nonconvergence, you can look at the individual diagnostics to discover which parameters have convergence problems. Type

. bayesstats grubin

Sometimes nonconvergence arises because you did not run enough simulations. Other times, it arises because of inherent problems with the model's specification. If you run more simulations without convergence and the same parameters keep being identified as problematic, you need to think about respecifying your model.

Let's see it work

Consider the diabetes data Efron et al. (2004) on 442 diabetes patients that record a measure of disease progression as an outcome y and 10 baseline covariates: age, sex, body mass index, mean arterial pressure, and six blood serum measurements. We consider the version of these data in which the covariates are standardized. This dataset is commonly used to demonstrate various model-building techniques, including Bayesian lasso. Here we use it to demonstrate the bayesstats grubin command.

Example of nonconvergence

We use bayes: regress to fit the Bayesian linear regression of y on all variables in the dataset and use option nchains(3) to generate three chains.

. bayes, nchains(3) rseed(16): regress y age sex bmi map tc ldl hdl tch ltg glu

Chain 1
  Burn-in ...
  Simulation ...

Chain 2
  Burn-in ...
  Simulation ...

Chain 3
  Burn-in ...
  Simulation ...

Model summary
Likelihood: y ~ regress(xb_y,{sigma2}) Priors: {y:age sex bmi map tc ldl hdl tch ltg glu _cons} ~ normal(0,10000) (1) {sigma2} ~ igamma(.01,.01)
(1) Parameters are elements of the linear form xb_y. Bayesian linear regression Number of chains = 3 Random-walk Metropolis-Hastings sampling Per MCMC chain: Iterations = 12,500 Burn-in = 2,500 Sample size = 10,000 Number of obs = 442 Avg acceptance rate = .321 Avg efficiency: min = .003771 avg = .01886 max = .1142 Avg log marginal-likelihood = -2457.6885 Max Gelman-Rubin Rc = 4.543
 
Equal-tailed
Mean Std. Dev. MCSE Median [95% Cred. Interval]
y
age 81.50951 87.57016 4.06945 71.7336 -50.84804 234.4335
sex -167.0296 71.9136 6.76093 -167.887 -289.098 -42.73281
bmi 352.3836 99.94654 6.13795 360.123 193.378 505.5432
map 286.0075 78.16619 5.23075 275.4511 174.2242 426.8137
tc -273.1433 164.7453 14.618 -295.8386 -501.8144 -16.60783
ld1 165.232 178.513 8.84052 231.598 -125.8265 352.7025
hd1 -94.36373 114.5865 7.16308 -106.0647 -263.5826 99.65676
tch 109.925 162.0595 13.0143 134.8212 -155.8764 332.5971
ltg 483.243 102.1253 6.00414 495.4546 307.5461 627.0424
glu 42.76467 117.208 6.29098 65.24344 -146.8923 198.615
_cons 152.4957 2.780968 .103842 152.4606 147.3315 157.9954
sigma2 3110.56 221.9696 3.79219 3100.067 2707.435 3566.951
Note: Default priors are used for model parameters. Note: Default initial values are used for multiple chains. Note: There is a high autocorrelation after 500 lags in at least one of the chains.

The maximum Gelman–Rubin diagnostic across all model parameters is labeled as Max Gelman–Rubin Rc in the header and is 4.543. Gelman and Rubin (1992) and Brooks and Gelman (1998) suggest that diagnostic Rc values greater than 1.2 for any of the model parameters should indicate nonconvergence. In practice, a more stringent rule of Rc < 1.1 is often used to declare convergence.

In our example, the maximum Rc of 4.543 is larger than 1.1, so our MCMC did not converge. The presence of the note about high autocorrelation is also concerning, although it may not necessarily indicate nonconvergence.

We can explore Gelman–Rubin diagnostics for all parameters to identify the problematic ones. We use bayesstats grubin for that. We additionally sort the results from largest to smallest to find the most problematic parameters.

. bayesstats grubin, sort

Gelman-Rubin convergence diagnostic

Number of chains     =            3
MCMC size, per chain =       10,000
Max Gelman-Rubin Rc  =     4.542823
Rc
y
ldl 4.542823
tch 3.376646
tc 3.184339
glu 3.089546
bmi 2.543104
hdl 2.447089
map 2.421061
age 2.40928
ltg 2.389624
sex 1.680013
_cons 1.082658
sigma2 1.023543
Convergence rule: Rc < 1.1

The coefficient for ldl has the maximum Rc value. All parameters except the last two exceed 1.1. Our model certainly has convergence issues.

Let's look at graphical diagnostics for the ldl coefficient.

. bayesgraph diagnostics {y:ldl}

There is a clear separation in the trace plots of the chains, and the distributions of all three chains are different.

So what do we do now? The answer is model- and data-specific. Sometimes we may simply need to increase our burn-in period, option burnin(). Other times, especially in the presence of highly correlated model parameters, we may need to improve efficiency of our sampling procedure by blocking parameters or using more efficient samplers; see Specifying MCMC sampling procedure. When the data provide little information about some of the model parameters, we may need to use more informative priors or entirely modify our model. See Convergence of MCMC for other suggestions.

We provide the solution for our model in the next section.

Example of convergence

In the above Example of nonconvergence, we fit a Bayesian linear regression to the diabetes data and encountered problems with MCMC convergence using our default model specification.

From the bayes: regress output above, there appears to be a large variation in the magnitudes of the estimated coefficients. We may try placing parameters with similar magnitudes in separate blocks to improve simulation efficiency.

Also, many variables in the diabetes dataset are correlated, and some are highly correlated. This induces high correlations between some of the model parameters. (For instance, you can use bayesgraph matrix to explore pairwise correlations between model parameters.) The default adaptive Metropolis–Hastings used by bayes: and bayesmh may suffer from low efficiency in the presence of many highly correlated model parameters. In our example, the default priors used by bayes: regress—normal for coefficients and inverse gamma for the variance—allow us to use a more efficient Gibbs sampling.

. bayes, nchains(3) rseed(16) gibbs: regress y age sex bmi map tc ldl hdl tch ltg glu

Chain 1
  Burn-in ...
  Simulation ...

Chain 2
  Burn-in ...
  Simulation ...

Chain 3
  Burn-in ...
  Simulation ...

Model summary
Likelihood: y ~ normal(xb_y,{sigma2}) Priors: {y:age sex bmi map tc ldl hdl tch ltg glu _cons} ~ normal(0,10000) (1) {sigma2} ~ igamma(.01,.01)
(1) Parameters are elements of the linear form xb_y. Bayesian linear regression Number of chains = 3 Gibbs sampling Per MCMC chain: Iterations = 12,500 Burn-in = 2,500 Sample size = 10,000 Number of obs = 442 Avg acceptance rate = 1 Avg efficiency: min = .9041 avg = .9853 max = 1 Avg log marginal-likelihood = -2435.5507 Max Gelman-Rubin Rc = 1
 
Equal-tailed
Mean Std. Dev. MCSE Median [95% Cred. Interval]
y
age 13.27542 51.20421 .295628 13.62079 -86.6546 112.654
sex -163.025 51.94371 .299897 -163.0511 -265.2759 -62.49451
bmi 428.655 54.8511 .325351 428.6502 321.9517 536.9177
map 269.312 53.95068 .311484 269.9104 162.9158 375.273
tc -33.02954 75.55542 .43803 -32.85623 -182.0179 113.7366
ld1 -72.58636 71.89936 .415111 -72.42117 -213.1838 68.32667
hd1 -185.885 65.39935 .377583 -185.9721 -313.8126 -57.03217
tch 121.3566 74.05157 .427537 121.5397 -24.13796 267.8116
ltg 370.8014 61.66557 .357777 371.1397 248.9354 492.0922
glu 103.5797 54.67812 .317191 103.3001 -3.523386 211.1194
_cons 152.0611 2.620844 .015131 152.0733 146.9208 157.1492
sigma2 3019.896 210.3033 1.27697 3010.666 2634.993 3462.634
Note: Default priors are used for model parameters. Note: Default initial values are used for multiple chains.

The maximum Gelman–Rubin diagnostic is now 1 (or very close to one) and is less than 1.1, which indicates that no convergence issues are detected.

We can look at the graphical diagnostics for the ldl coefficient again.

. bayesgraph diagnostics {y:ldl}

There is a perfect overlap between the trace plots and the distributions of the three chains. The autocorrelation is essentially nonexistent, as is expected with Gibbs sampling, in all three chains.

References

Brooks, S. P., and A. Gelman. 1998. General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics 7: 434–455.

Efron, B., T. Hastie, I. Johnstone, and R. Tibshirani. 2004. Least angle regression. The Annals of Statistics 32: 407–499.

Gelman, A., and D. B. Rubin. 1992. Inference from iterative simulation using multiple sequences. Statistical Science 7: 457–472.

Tell me more

Learn more about multiple chains, Bayesian predictions, and Bayesian lasso.

Learn more about Stata's Bayesian analysis features.

Read more about Gelman–Rubin convergence diagnostic and Bayesian analysis in the Stata Bayesian Analysis Reference Manual.

ORDER STATA   UPGRADE NOW