Use bootstrap in parallel on multiple computers
Use simulate and run Monte Carlo simulations in parallel on multiple computers
Simultaneously draw random numbers from up to 32,768 separate instances of Stata
Bootstraps and Monte Carlo simulation use random numbers to perform the same calculations over and over again. So do some other statistical procedures. With a little organization of your work, you can perform these kinds of calculations simultaneously on different computers. The problem is generating the random numbers. If you are running on 12 different computers, are you going to use twelve different seeds? Even that will not guarantee correct use of pseudorandom-number generators.
Stream random-number generators solve this problem. You set one seed and specify stream 1 on the first computer, stream 2 on the second, and so on.
Stata provides a stream version of the 64-bit Mersenne Twister, Stata's default pseudorandom-number generator.
Computer random numbers are elements in a sequence of deterministic numbers that only appear to be random. A seed specifies an entry point into this sequence. See figure 1. Each tick is an element in the sequence—a "random" number. Setting the seed to 12345 means that the tick identified by the arrow will be the next "random" number drawn.
When using ordinary (serial) random-number generators, there is no way to specify different seeds that ensure the corresponding random samples drawn from the sequence do not overlap. You cannot simply run different bootstrap or Monte Carlo draws over different computers using serial random-number generators.
Stream random-number generators solve this problem by partitioning the sequence into nonoverlapping subsequences known as streams, as shown in figure 2.
Setting the seed to 12345 for the stream random-number generator enters at the same place as previously. The stream random-number generator, however, also partitions the Mersenne Twister sequence into 32,768 streams, each containing 2^128 random numbers.
When you use Stata's stream random-number generator, you specify a stream number and a seed, in that order.
To draw numbers from the stream Mersenne Twister random-number generator, set the stream and set the seed:
. set rngstream 10 . set seed 123456
After that, use Stata's runiform() function—or any of its other random-number functions—just as you ordinarily would:
. generate u = runiform()
Or use Stata's bootstrap or simulate functions, which automate obtaining bootstrap standard errors and Monte Carlo simulations.
We created two do-files:
------------------------------------- file1.do --- |
set rngstream 1 |
set seed 12345 |
sysuse auto |
bootstrap, reps(100) saving(machine1, replace): |
regress mpg weight gear foreign |
-------------------------------------------------- |
------------------------------------- file2.do --- |
set rngstream 2 |
set seed 12345 |
sysuse auto |
bootstrap, reps(100) saving(machine2, replace): |
regress mpg weight gear foreign |
-------------------------------------------------- |
The do-files are nearly identical. One says stream 1 and machine1 and the other says stream 2 and machine2. Using Stata's programming features, we could have written just one do-file.
We ran file1.do on computer 1 and file2.do on computer 2.
We copied the resulting dataset, machine2.dta, from computer 2 to computer 1, on which we already had machine1.dta.
And now, we obtain our combined results:
. clear all . use machine1 (bootstrap: regress) . append using machine2 . bstat Bootstrap results Number of obs = 74 Replications = 200 Command: regress mpg weight gear foreign
Observed Bootstrap Normal-based | ||
Coef. Std. Err. z P>|z| [95% Conf. Interval] | ||
weight | -.006139 .0005678 -10.81 0.000 -.0072519 -.0050262 | |
gear_ratio | 1.457113 1.266586 1.15 0.250 -1.02535 3.939577 | |
foreign | -2.221682 1.187847 -1.87 0.061 -4.549819 .1064562 | |
_cons | 36.10135 4.562644 7.91 0.000 27.15873 45.04397 | |
For computationally intensive problems, the two-machine time will be about one-half the one-machine time. Using distinct streams on different computers can dramatically reduce the time required for computationally intensive problems.
Read more about the stream random-number generator in [R] set rngstream.