Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Efficient parallel computing in Stata/MP
From
Alan Riley <[email protected]>
To
[email protected]
Subject
Re: st: Efficient parallel computing in Stata/MP
Date
Fri, 27 Sep 2013 21:40:35 -0500
Demian ([email protected]) needs to run many regressions and
wonders whether it is better to run a single Stata/MP instance,
running all regressions in it, or multiple instances, each
running a subset of the regressions.
The original question was
> Dear Statalist members: I need some help, because I'm not sure about
> some Stata/MP properties for parallel computing.
> We know from http://www.stata.com/statamp/statamp.pdf that many
> estimation commands (e.g. regress) are almost fully parallelizable and
> that average efficiency for all commands is around 72%. So, in
> standard linear regression problems (e.g running one million equations
> for parameter stability analysis), using Stata/MP in a multiple-core
> CPU would be an optimal time saving strategy.
> However, it is also possible to exploit the multi-core CPU environment
> by working with multiple parallel Stata/MP instances (e.g. using 4
> parallel Stata/MP instances to run 250.000 linear regressions with
> each core).
> My question is simple.... Can I save some time by using this "dual
> parallelism" methodology? (because parallel computing is
> authomatically used by Stata/MP to parallelize internal tasks of, for
> example, regress; and because I also parallelize the whole set of
> regressions between 4 cores, by means of multiple Stata/MP instances).
The answer is ... it depends. More information is needed to be
able to answer this question. In later emails in this thread,
Demian mentioned panels -- so it is unclear to me whether only
the -regress- command is desired or whether some other estimation
command is being executed.
The next question I have is what the number of variables and observations
is in each regression. If Demian is using -regress-, and has even
a moderate number of observations, the best solution may simply be
to use a single instance of Stata/MP utilizing all cores on his machine
to run all the regressions sequentially. -regress- is almost
perfectly parallelized, so as long as there are enough observations
in each regression, there would be no point in launching multiple
instances of Stata to run separate regressions.
If Demian does wish to run multiple instances of Stata, each running
a separate set of regressions, rather than a single instance of
Stata/MP running all regressions, I would recommend that he use
1 core per instance of Stata rather than having multiple Stata/MP
instances competing against each other for the available cores
on the computer. I would also recommend, as others have pointed
out in this thread, that if Demian wants to run separate instances
of Stata, that he take a look at the 'parallel' prefix command
by George Vega which he presented at the 2013 Stata Conference
in New Orleans:
http://www.stata.com/meeting/new-orleans13/abstracts/materials/nola13-vega.pdf
Given the number of regressions Demian needs to run (in another
email he mentioned 100 million regressions), I would recommend that
he experiment. Assuming he has an 8-core machine, he should run,
say, 8000 regressions one after another using Stata MP/8. Then, run
the same 8000 regressions, 1000 on each core in 8 instances of Stata
using 1 core each. Compare the timing, and decide how to proceed.
As I said above, -regress- is so nearly perfectly parallelized that
even with a moderate number of observations, a single instance of
Stata/MP may well be the best way to go, and it is certainly
easier to set up a single job rather than to run multiple simultaneous
jobs and combine the results of each.
If Demian has further questions, or would like to discuss the problme
in more detail, he can email Technical Services at [email protected]
and we'll be happy to give him advice.
Alan
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/