Stata
Products Purchase Support Company
Search
   >> Home >> Products >> Capabilities >> Resampling & simulation methods >> Bootstrap sampling & estimation Bookmark and Share

Bootstrap sampling and estimation

  • Bootstrap of Stata commands
  • Boostrap of user-written programs
  • Standard errors and bias estimation

Stata’s programmability makes performing bootstrap sampling and estimation possible (see Efron 1979, 1982; Efron and Tibshirani 1993; Mooney and Duval 1993). We provide two commands to simplify bootstrap estimation. bsample draws a sample with replacement from a dataset. bsample may be used in user-written programs.

It is easier, however, to perform bootstrap estimation using the bootstrap prefix command. bootstrap allows the user to supply an expression that is a function of the saved results of existing commands, or you can write a program to calculate the statistics of interest. bootstrap then can repeatedly draw a sample with replacement, run the user-written program, collect the results into a new dataset, and present the results. The user-written calculation program is easy to write because every Stata command saves the statistics it calculates.

For instance, assume that we wish to obtain the bootstrap estimate of the standard error of the median of a variable called mpg. Stata has a built-in command, summarize, that calculates and displays summary statistics; it calculates means, standard deviations, skewness, kurtosis, and various percentiles. Among those percentiles is the 50th percentile—the median. In addition to displaying the calculated results, summarize saves them, and looking in the manual, we discover that the median is saved in r(p50). To get a bootstrap estimate of its standard error, all we need to do is type

  . bootstrap r(p50), reps(1000): summarize mpg, detail

and bootstrap will do all of the work for us. We'll also specify a seed() option so that you can reproduce our results.

  . webuse auto
  (1978 Automobile Data)

  . bootstrap r(p50), reps(1000) seed(1234): summarize mpg, detail
  (running summarize on estimation sample)

  (output omitted)

  Bootstrap results                               Number of obs      =        74
                                                  Replications       =      1000

        command:  summarize mpg, detail
          _bs_1:  r(p50)

  ------------------------------------------------------------------------------
               |   Observed   Bootstrap                         Normal-based
               |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
  -------------+----------------------------------------------------------------
         _bs_1 |         20    .963566    20.76   0.000     18.11145    21.88855
  ------------------------------------------------------------------------------

Use the estat bootstrap postestimation command to report a table with alternative confidence intervals and an estimate of bias.

  . estat bootstrap, all

  Bootstrap results                               Number of obs      =        74
                                                  Replications       =      1000

        command:  summarize mpg, detail
          _bs_1:  r(p50)

  ------------------------------------------------------------------------------
               |    Observed               Bootstrap
               |       Coef.       Bias    Std. Err.  [95% Conf. Interval]
  -------------+----------------------------------------------------------------
         _bs_1 |          20       .187   .96356601    18.11145   21.88855   (N)
               |                                             19         22   (P)
               |                                             19         22  (BC)
  ------------------------------------------------------------------------------
  (N)    normal confidence interval
  (P)    percentile confidence interval
  (BC)   bias-corrected confidence interval

For an example of when we would need to write a program, consider the case of bootstrapping the ratio of two means.

We first define the calculation routine, which we can name whatever we wish,

  program myratio, rclass
          version 9
          summarize length
          local length = r(mean)
          summarize turn
          local turn = r(mean)
          return scalar ratio = `length'/`turn'
  end

Our program calls summarize and stores the mean of the variable length in a local macro. The program then repeats this procedure for the second variable turn. Finally, the ratio of the two means is computed and returned by our program in the saved result we call r(ratio).

With our program written, we can now obtain the bootstrap estimate by simply typing

  . bootstrap r(ratio), reps(#): myratio

This means that we will execute bootstrap with our myratio program for # replications. Below we request 1,000 replications and specify a random-number seed so you can reproduce our results:

  . bootstrap r(ratio), reps(1000) seed(4567): myratio
  (running myratio on estimation sample)

  (output omitted)

  Bootstrap results                               Number of obs      =        74
                                                  Replications       =      1000

        command:  myratio
          _bs_1:  r(ratio)

  ------------------------------------------------------------------------------
               |   Observed   Bootstrap                         Normal-based
               |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
  -------------+----------------------------------------------------------------
         _bs_1 |   4.739945   .0344786   137.47   0.000     4.672369    4.807522
  ------------------------------------------------------------------------------

The ratio, calculated over the original sample, is 4.739945; the bootstrap estimate of the standard error of the ratio is 0.0344786. Had we wanted to keep the 1,000-observation dataset of bootstrapped results for subsequent analysis, we would have typed

  . bootstrap r(ratio), reps(1000) seed(4567) saving(mydata): myratio

bootstrap can be used with any Stata estimator or calculation command and even with user-written calculation commands.

We have found bootstrap particularly useful in obtaining estimates of the standard errors of quantile-regression coefficients. Stata performs quantile regression and obtains the standard errors using the method suggested by Koenker and Bassett (1978, 1982). Rogers (1992) reports that these standard errors are satisfactory in the homoskedastic case but that they appear to be understated in the presence of heteroskedastic errors. One alternative is to bootstrap the estimated coefficients to obtain the standard errors. For instance, say that you wish to estimate a median regression of price on variables weight, length, and foreign. Typing qreg price weight length foreign will produce the estimates along with Koenker–Bassett standard errors. To obtain bootstrap standard errors, we could issue the command

  . bootstrap, reps(#): qreg price weight length foreign

We recommend this procedure so highly that Gould (1992) wrote a new command in Stata’s programming language to further automate this procedure for quantile regression. Typing bsqreg price weight length foreign will also produce the bootstrapped results.

See New in Stata 11 for more about what was added in Stata Release 11.


References

Efron, B. 1979.
Bootstrap methods: another look at the jackknife. Annals of Statistics 7: 1–26.
------. 1982.
The Jackknife, the Bootstrap and Other Resampling Plans. Philadelphia: Society for Industrial and Applied Mathematics.
Efron, B. and R. J. Tibshirani. 1993.
An Introduction to the Bootstrap. New York: Chapman & Hall.
Gould, W. 1992.
sg11.1: Quantile regression with bootstrapped standard errors. Stata Technical Bulletin 9: 19–21. Reprinted in Stata Technical Bulletic Reprints, vol. 2, pp. 137–139.
Koenker, R., and G. Bassett, Jr. 1978.
Asymptotic theory of least absolute error regression. Journal of the American Statistical Association 73: 618–622.
------. 1982.
Robust tests for heteroskedasticity based on regression quantiles. Econometrica 50: 43–61.
Mooney, C. Z., and R. D. Duval. 1993.
Bootstrapping: A Nonparametric Approach to Statistical Inference. Newbury Park, CA: Sage.
Rogers, W. H. 1992.
sg11: Quantile regression standard errors. Stata Technical Bulletin 9: 16–19. Reprinted in Stata Technical Bulletin Reprints, vol. 2, pp. 133–137.
Stata 11
Overview: Why use Stata?
Stata/MP
64-bit Stata
Capabilities
Overview
Data management
Graphics
Basic statistics
Linear models
Binary and discrete outcomes
Panel data
Survey methods
Time series
Survival analysis
Epidemiology tools
Mixed models
GLM
ANOVA / MANOVA
Multiple imputation
Exact statistics
Nonparametric methods
Multivariate methods
Cluster analysis
Resampling
Model testing
Maximum likelihood
Other statistical methods
Programming
Matrix programming—Mata
Internet capabilities
Accessibility
Sample session
User-written commands
New in Stata 11
Supported platforms
Which Stata package?
Technical support
User comments
Products
Stata 11
Order Stata
Upgrade
Training
Bookstore
Stata Journal
Stata Press
Stata News
STB
Stat/Transfer
Gift Shop

Site overview
Products
Resources & support
Company
Site index

© Copyright 1996–2009 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index