Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: Efficient parallel computing in Stata/MP
From
Daniel Feenberg <[email protected]>
To
"[email protected]" <[email protected]>
Subject
st: RE: Efficient parallel computing in Stata/MP
Date
Fri, 27 Sep 2013 07:43:36 -0400 (EDT)
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Demian Panigo
Sent: 27 September 2013 01:02
To: [email protected]
Subject: st: Efficient parallel computing in Stata/MP
Dear Statalist members: I need some help, because I'm not sure about
some Stata/MP properties for parallel computing.
We know from http://www.stata.com/statamp/statamp.pdf that many
estimation commands (e.g. regress) are almost fully parallelizable and
that average efficiency for all commands is around 72%. So, in
standard linear regression problems (e.g running one million equations
for parameter stability analysis), using Stata/MP in a multiple-core
CPU would be an optimal time saving strategy.
However, it is also possible to exploit the multi-core CPU environment
by working with multiple parallel Stata/MP instances (e.g. using 4
parallel Stata/MP instances to run 250.000 linear regressions with
each core).
My question is simple.... Can I save some time by using this "dual
parallelism" methodology? (because parallel computing is
authomatically used by Stata/MP to parallelize internal tasks of, for
example, regress; and because I also parallelize the whole set of
regressions between 4 cores, by means of multiple Stata/MP instances).
Thanks in advance
*
In my experience, Stata/MP fully exploits as many real cores as are
available, very efficiently for regression commands. If you have
hypercores, running multiple Stata jobs will exploit those efficiently
also. I posted the results of a simple experiment at:
http://www.nber.org/stata/efficient
under heading "Stata/MP".
-parallel.ado- is a very interesting routine. It will start up multiple
Stata processes and let each one read a part of the dataset, then combine
the results into a single dataset. For processes that are single-threaded
for no good reason, or if you don't have Stata/MP, it seems like a great
idea. I believe it will also work well with hyper-cores, but I have no
experience with it. But for I/O it would just make things worse, since
each thread has to read the entire dataset.
See
http://www.stata.com/statamp/report.pdf
for a more discouraging report on hyper-cores. I don't have an explanation
for the difference in experiences. There is no substitute for
experimentation on your actual hardware, and there would be interest on
this list in your experience.
Daniel Feenberg
NBER
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/