Home  /  Products  /  Stata 19  /  Bayesian bootstrap

← See Stata 19's new features

Highlights

  • Bayesian bootstrap for official or community-contributed commands

  • Continuous importance weights instead of traditional frequency weights

  • Priors for sampling observations

  • Better small-sample performance than classic bootstrap

  • See more resampling features

Use the new bayesboot prefix to perform Bayesian bootstrap to obtain more precise parameter estimates in small samples and incorporate prior information when sampling observations. Use it with official commands or community-contributed commands.

The Bayesian bootstrap, pioneered by Rubin (1981), offers an alternative to traditional bootstrap methods by leveraging Bayesian principles. Instead of sampling with replacement where observations are either included or excluded, the Bayesian bootstrap assigns continuous importance weights to each observation from a Dirichlet distribution. This approach directly models uncertainty about how representative each data point is of the underlying population.

The Bayesian bootstrap allows us to interpret the representativeness of each observation as a posterior distribution of importance weights. Within this Bayesian framework, researchers can incorporate their prior knowledge when assigning weights to each observation. Additionally, the distribution from which observations are drawn is smooth, whereas the traditional bootstrap methods make discrete inclusion and exclusion decisions. This smoothness makes Bayesian bootstrap immune to certain issues that arise with traditional bootstrap, such as replicates with collinearity or situations where entire categories may not be represented.

The bayesboot command performs Bayesian bootstrap by generating importance replication weights for each observation from a Dirichlet distribution and using them when estimating parameters and statistics. By default, each observation has the same probability of being selected, but you can customize this to include more informative priors for observations by using the priorpowers() option. bayesboot works seamlessly with official and community-contributed commands, similarly to the existing bootstrap prefix.

Let's see it work

Bayesian bootstrap and traditional bootstrap

Let's compare Bayesian bootstrap with traditional bootstrap by applying them to regression coefficients of a linear regression. We analyze how vehicle price (price) and repair records (rep78) affect fuel efficiency (mpg) by using the auto dataset.

We first perform traditional bootstrap by using the existing bootstrap prefix and then Bayesian bootstrap by using the new bayesboot prefix. We specify the rseed(111) option with both for reproducibility.

. sysuse auto
(1978 automobile data)

. drop if rep78 == .
(5 observations deleted)

. bootstrap, rseed(111): regress mpg price i.rep78
(running regress on estimation sample)

Bootstrap replications (50): .x.......xx........x.........30.........40..x......
> 50 done
x: Error occurred when bootstrap executed regress.

Linear regression                                       Number of obs =     69
                                                        Replications  =     45
                                                        Wald chi2(5)  =  30.44
                                                        Prob > chi2   = 0.0000
                                                        R-squared     = 0.4241
                                                        Adj R-squared = 0.3784
                                                        Root MSE      = 4.6251

Observed Bootstrap Normal-based
mpg coefficient std. err. z P>|z| [95% conf. interval]
price -.0008829 .0002191 -4.03 0.000 -.0013124 -.0004535
rep78
2 -.6361411 2.359325 -0.27 0.787 -5.260334 3.988051
3 .0797594 2.063937 0.04 0.969 -3.965483 4.125002
4 1.99724 2.207023 0.90 0.365 -2.328446 6.322925
5 7.554265 3.228696 2.34 0.019 1.226137 13.88239
_cons 25.03013 2.02501 12.36 0.000 21.06118 28.99907
Note: One or more parameters could not be estimated in 5 bootstrap replicates; standard-error estimates include only complete replications.

Now let's perform the same analysis using Bayesian bootstrap. We also specify bayesboot's generate() option to save the generated importance weights in the new variables iw1 through iw50 for later comparison.

. bayesboot, rseed(111) generate(iw): regress mpg price i.rep78
(running regress on estimation sample)

Bayesian bootstrap replications (50): .........10.........20.........30.........
> 40.........50 done

Bayesian bootstrap
Observation prior: Improper

Linear regression                                       Number of obs =     69
                                                        Replications  =     50
                                                        Wald chi2(5)  =  44.10
                                                        Prob > chi2   = 0.0000
                                                        R-squared     = 0.4241
                                                        Adj R-squared = 0.3784
                                                        Root MSE      = 4.6251

Bayesian
Observed bootstrap Normal-based
mpg coefficient std. err. z P>|z| [95% conf. interval]
price -.0008829 .0001901 -4.64 0.000 -.0012556 -.0005103
rep78
2 -.6361411 1.876978 -0.34 0.735 -4.31495 3.042667
3 .0797594 1.652173 0.05 0.961 -3.158441 3.317959
4 1.99724 1.926711 1.04 0.300 -1.779045 5.773525
5 7.554265 2.427152 3.11 0.002 2.797136 12.31139
_cons 25.03013 1.980008 12.64 0.000 21.14938 28.91087

Although both methods lead to similar overall conclusions, an advantage of Bayesian bootstrap can be seen from the replication output. Notice the “x” markers in the traditional bootstrap results. These markers indicate that replications could not be computed, leading to missing values for regression coefficient estimates in those replications. This could happen because of perfect collinearity or because some of rep78's categories do not have any observations to compute a coefficient. In contrast, bayesboot completes all 50 replications without errors.

This improved stability stems from the use of continuous weights by Bayesian bootstrap, as opposed to the discrete resampling of traditional bootstrap. The continuous weighting approach maintains greater numerical stability by avoiding the perfect collinearity that sometimes occurs with discrete resampling.

Incorporating prior information

One of Bayesian bootstrap's key advantages is the ability to incorporate domain knowledge by specifying priors for observations when you have information about the relative importance or reliability of observations.

Below, we explore how different prior values affect estimation precision and statistical significance by using the priorpowers() option to modify the default prior.

. generate priorvar = rbeta(2,7)+2

. bayesboot, priorpowers(priorvar) rseed(111): regress mpg price i.rep78
(running regress on estimation sample)

Bayesian bootstrap replications (50): .........10.........20.........30.........
> 40.........50 done

Bayesian bootstrap
Observation prior: priorvar

Linear regression                                       Number of obs =     69
                                                        Replications  =     50
                                                        Wald chi2(5)  = 124.40
                                                        Prob > chi2   = 0.0000
                                                        R-squared     = 0.4241
                                                        Adj R-squared = 0.3784
                                                        Root MSE      = 4.6251

Bayesian
Observed bootstrap Normal-based
mpg coefficient std. err. z P>|z| [95% conf. interval]
price -.0008829 .0001067 -8.28 0.000 -.001092 -.0006738
rep78
2 -.6361411 1.002831 -0.63 0.526 -2.601653 1.329371
3 .0797594 .9792511 0.08 0.935 -1.839538 1.999056
4 1.99724 .9489378 2.10 0.035 .1373558 3.857124
5 7.554265 1.481269 5.10 0.000 4.651031 10.4575
_cons 25.03013 1.026821 24.38 0.000 23.01759 27.04266

Looking at the coefficient for 4.rep78, we see that its confidence interval includes 0 with default priors, whereas it does not with our custom priors. This occurs because higher prior values represent stronger belief in the representativeness of the dataset, resulting in narrower confidence intervals.

bayesboot as a wrapper

The bayesboot command is a convenience wrapper that combines the following two features:

  1. The rwgen bayes command, which generates importance weights based on the Bayesian bootstrap method
  2. bootstrap's iweights() option, which applies these weights during estimation

We can replicate the results from bayesboot in the previous example by specifying the following two commands.

. rwgen bayes myiw, priorpowers(priorvar) rseed(111)

. bootstrap, iweights(myiw1-myiw50): regress mpg price i.rep78
(running regress on estimation sample)

Bootstrap replications (50): .........10.........20.........30.........40......
> ...50 done

Linear regression                                       Number of obs =     69
                                                        Replications  =     50
                                                        Wald chi2(5)  = 124.40
                                                        Prob > chi2   = 0.0000
                                                        R-squared     = 0.4241
                                                        Adj R-squared = 0.3784
                                                        Root MSE      = 4.6251

Observed Bootstrap Normal-based
mpg coefficient std. err. z P>|z| [95% conf. interval]
price -.0008829 .0001067 -8.28 0.000 -.001092 -.0006738
rep78
2 -.6361411 1.002831 -0.63 0.526 -2.601653 1.329371
3 .0797594 .9792511 0.08 0.935 -1.839538 1.999056
4 1.99724 .9489378 2.10 0.035 .1373558 3.857124
5 7.554265 1.481269 5.10 0.000 4.651031 10.4575
_cons 25.03013 1.026821 24.38 0.000 23.01759 27.04266

The impact of custom priors

To understand how custom priors affect our analysis, let's compare the distributions of the default and custom weights for the first replicate:

. summarize iw1 myiw1

Variable Obs Mean Std. dev. Min Max
iw1 69 .0144928 .0157269 .0001861 .0751048
myiw1 69 .0144928 .0077043 .0016568 .0385901

The summary statistics reveal important differences in the distributions of weights. Although both sets maintain the same mean (1/69 = 0.0144928), the custom weights based on our higher prior values show substantially lower variability. This difference in variability has a direct impact on our regression results, as we saw earlier.

Reference

Rubin, D. B. 1981. The Bayesian bootstrap. Annals of Statistics 9: 130–134. https://doi.org/10.1214/aos/1176345338.

Tell me more

Read more about Bayesian bootstrap in [R] bayesboot and the rwgen command in [R] rwgen in the Stata Base Reference Manual.

Learn more about Stata's resampling features.

View all the new features in Stata 19, and, in particular, new in resampling.

Ready to get started?

Experience powerful statistical tools, reproducible workflows, and a seamless user experience—all in one trusted platform.