Note: The following question and answer is based on an exchange that started on Statalist.
Title | Guidelines for bootstrap samples | |
Author |
William Gould, StataCorp Jeff Pitblado, StataCorp |
I am running a negative binomial regression on a sample of 488 firms. For various reasons [...], I decided to use the bootstrapping procedure in Stata on my data. Are there general guidelines that have been proposed for how large the bootstrapped samples should be relative to the total number of cases in the dataset from which they are drawn?
When using the bootstrap to estimate standard errors and to construct confidence intervals, the original sample size should be used. Consider a simple example where we wish to bootstrap the coefficient on foreign from a regression of weight and foreign on mpg from the automobile data. The sample size is 74, but suppose we draw only 37 observations (half of the observed sample size) each time we resample the data 2,000 times.
. sysuse auto, clear . set seed 3957574 . bootstrap _b[foreign], size(37) reps(2000) dots: regress mpg weight foreign (running regress on estimation sample) Bootstrap replications (2,000): .........10.........20.........30.........40.... > .....50.........60.........70.........80.........90.........100.........110... (output omitted) Linear regression Number of obs = 74 Replications = 2,000 Command: regress mpg weight foreign _bs_1: _b[foreign]
Observed Bootstrap Normal-based | ||
coefficient std. err. z P>|z| [95% conf. interval] | ||
_bs_1 | -1.650029 1.661728 -0.99 0.321 -4.906956 1.606898 | |
Now consider the same exercise with 74 observations.
. set seed 91857785 . bootstrap _b[foreign], reps(2000) dots: regress mpg weight foreign (running regress on estimation sample) Bootstrap replications (2,000): .........10.........20.........30.........40.... > .....50.........60.........70.........80.........90.........100.........110... (output omitted) Linear regression Number of obs = 74 Replications = 2,000 Command: regress mpg weight foreign _bs_1: _b[foreign]
Observed Bootstrap Normal-based | ||
coefficient std. err. z P>|z| [95% conf. interval] | ||
_bs_1 | -1.650029 1.121612 -1.47 0.141 -3.848348 .5482899 | |
As explained below, the difference in the bias estimates is due to the random nature of the bootstrap and not the number of observations taken for each replication. However, the standard error estimates are dependent upon the number of observations in each replication. Here, on average, we would expect the variance estimate of _b[foreign] to be twice as large for a sample of 37 observations than that for 74 observations. This is due mainly to the form of the variance of the sample mean, s2/n.
The number of observations in the original underlying dataset does not play a role in determining the number of replications required to get good bootstrap variance estimates. The dataset must have enough observations (preferably an infinite number) so that the empirical distribution can be used as an approximation to the population's true distribution.
In terms of the number of replications, there is no fixed answer such as “250” or “1,000” to the question. The right answer is that you should choose an infinite number of replications because, at a formal level, that is what the bootstrap requires. The key to the usefulness of the bootstrap is that it converges in terms of numbers of replications reasonably quickly, and so running a finite number of replications is good enough—assuming the number of replications chosen is large enough.
The above statement contains the key to choosing the right number of replications. Here is the recipe:
Whether results change meaningfully is a matter of judgment and has to be interpreted given the problem at hand. How accurate do you need the standard errors, confidence intervals, etc.? Often, a few digits of precision is good enough because, even if you had the standard error calculated perfectly, you have to ask yourself how much you believe your model in terms of all the other assumptions that went into it. For instance, in a Becker earnings model of the return to schooling, you might tell me return is 6% with a standard error of 1, and I might believe you. If you told me the return is 6.10394884% and the standard error is .9899394, you have more precision but have not provided any additional useful information.
If you want more precision, it may take more replications than you would guess. Using the automobile data, I looked at linear regression,
. regress mpg weight foreign
and obtained the bootstrapped standard error for _b[foreign]. I did this for 20 replications, 40, 60, all the way up to 4,000. Here is a graph of the results as a function of the number of replications:
The vertical axis shows the bootstrapped standard error for _b[foreign]. Even with more than 1,000 replications, the standard error varied between 1.10 and 1.20, and 90% of the results were between 1.11 and 1.18. As a side experiment, I ran
. bootstrap _b[foreign], reps(20000): regress mpg weight foreign
twice and got a reported standard error of 1.14 and 1.16. At 40,000 replications, I got a reported standard error of 1.14.
Here is the program I used to obtain the above graph:
capture program drop Accum program Accum postfile results se bias n using sim, replace forvalues n=20(20)4000{ noisily display " `n'" _c quietly bootstrap _b[foreign] e(N), reps(`n'): /// regress mpg weight foreign tempname bias matrix `bias'=e(bias) local b_bias=`bias'[1,1] local n=e(N_reps) local se=_se[_bs_1] post results (`se') (`b_bias') (`n') } postclose results end clear sysuse auto set seed 12345 Accum use sim, clear scatter se n, xtitle("replications") ytitle("bootstrap standard error")