Title | Bootstrapping vectors | |
Author | Jeffrey Pitblado, StataCorp |
Note: This FAQ has been updated for Stata 14. Because bootstrap is based on random draws, results are different on previous versions because of the new 64-bit Mersenne Twister pseudorandom numbers.
I have a program that calculates many statistics for each income quintile in my sample. To make this manageable, I store the estimates as variables—each statistic that I've calculated is a variable, and there are 5 observations, one for each quintile. I want to bootstrap the results, but it appears that bootstrap works only for scalars. I could break up the variables into scalars so that the call to bootstrap would be
. bootstrap (stat1[1]) (stat1[2]) (stat1[3]) (stat1[4]) /// (stat1[5]) (stat2[1]) (stat2[2]): command ...
but that would be incredibly tedious because there are a lot of statistics. Is there any way to simplify this by posting variables (or vectors) of results?
The bootstrap command understands _b to mean all elements in the e(b) vector (coefficients vector posted by estimation commands). For example, you can now easily bootstrap all the coefficients from a regression:
. bootstrap _b: regress mpg weight length ...
To take advantage of this syntax, you will have to modify your program so that it is an e-class command that posts the values of interest into e(b) instead of placing them in variables. Then, you can do something like
. bootstrap _b, reps(100): command
Here is an example that posts the vector (1,2,3,4) to e(b):
capture program drop myepost program myepost, eclass version 13.0 tempname bb matrix `bb' = 1,2,3,4 ereturn post `bb' end myepost matrix list e(b)
Here is a log of the result:
. myepost . matrix list e(b) e(b)[1,4] c1 c2 c3 c4 y1 1 2 3 4
Now you can use the above idea to pass a vector of results to bootstrap. To see that the method is working, you can pass the coefficients of a regression:
capture program drop myreg program myreg, eclass version 13.0 tempname bb quietly regress mpg turn matrix `bb'=e(b) ereturn post `bb' ereturn local cmd="bootstrap" end clear sysuse auto set seed 12345 bootstrap _b, reps(50) nowarn: myreg set seed 12345 bootstrap _b, reps(50): regress mpg turn
Here is a log of the result:
. bootstrap _b, reps(50) nowarn: myreg (running myreg on estimation sample) Bootstrap replications (50)
1 | 2 | 3 | 4 | 5 |
Observed Bootstrap Normal-based | ||
coefficient std. err. z P>|z| [95% conf. interval] | ||
turn | -.9457877 .1083031 -8.73 0.000 -1.158058 -.7335175 | |
_cons | 58.7965 4.67459 12.58 0.000 49.63448 67.95853 | |
1 | 2 | 3 | 4 | 5 |
Observed Bootstrap Normal-based | ||
mpg | coefficient std. err. z P>|z| [95% conf. interval] | |
turn | -.9457877 .1083031 -8.73 0.000 -1.158058 -.7335175 | |
_cons | 58.7965 4.67459 12.58 0.000 49.63448 67.95853 | |
There is one difference between the first program myepost and myreg: myreg has also saved e(cmd)="bootstrap". This is necessary so that bootstrap knows how it is to display the results. When you bootstrap an official Stata estimation command, bootstrap uses the estimation command's replay feature to display the coefficient table. This will show the bootstrapped standard errors since the bootstrapped covariance matrix is posted in e(V) by bootstrap. Since the command myreg doesn’t have a replay feature, you need to use bootstrap to display the results. You can do this by setting e(cmd)="bootstrap".