Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Efficient parallel computing in Stata/MP
From
Demian Panigo <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Efficient parallel computing in Stata/MP
Date
Sat, 28 Sep 2013 09:42:37 -0300
Thank you very much Alan
Our new code run regress, xtreg, and many other regression commands
I will take your advice serously. No doubt... we need to check this
issue by experimenting.
Thanks a lot...
Demian
P.S.: One final question Alan..... Why should I use "parallel"
altogether with multiple Stata/SE instances? More precisely....
Multiple Stata/SE instances will be fully exploiting multiple
cores....by adding the parallel command it seems to me that we will
generate an internal competition for available core processing power.
Anyway, all this uncertainties will dessapear by means of proper experiments.
Thanks again
Demian
2013/9/27 Alan Riley <[email protected]>:
> Demian ([email protected]) needs to run many regressions and
> wonders whether it is better to run a single Stata/MP instance,
> running all regressions in it, or multiple instances, each
> running a subset of the regressions.
>
> The original question was
>
>> Dear Statalist members: I need some help, because I'm not sure about
>> some Stata/MP properties for parallel computing.
>> We know from http://www.stata.com/statamp/statamp.pdf that many
>> estimation commands (e.g. regress) are almost fully parallelizable and
>> that average efficiency for all commands is around 72%. So, in
>> standard linear regression problems (e.g running one million equations
>> for parameter stability analysis), using Stata/MP in a multiple-core
>> CPU would be an optimal time saving strategy.
>> However, it is also possible to exploit the multi-core CPU environment
>> by working with multiple parallel Stata/MP instances (e.g. using 4
>> parallel Stata/MP instances to run 250.000 linear regressions with
>> each core).
>> My question is simple.... Can I save some time by using this "dual
>> parallelism" methodology? (because parallel computing is
>> authomatically used by Stata/MP to parallelize internal tasks of, for
>> example, regress; and because I also parallelize the whole set of
>> regressions between 4 cores, by means of multiple Stata/MP instances).
>
> The answer is ... it depends. More information is needed to be
> able to answer this question. In later emails in this thread,
> Demian mentioned panels -- so it is unclear to me whether only
> the -regress- command is desired or whether some other estimation
> command is being executed.
>
> The next question I have is what the number of variables and observations
> is in each regression. If Demian is using -regress-, and has even
> a moderate number of observations, the best solution may simply be
> to use a single instance of Stata/MP utilizing all cores on his machine
> to run all the regressions sequentially. -regress- is almost
> perfectly parallelized, so as long as there are enough observations
> in each regression, there would be no point in launching multiple
> instances of Stata to run separate regressions.
>
> If Demian does wish to run multiple instances of Stata, each running
> a separate set of regressions, rather than a single instance of
> Stata/MP running all regressions, I would recommend that he use
> 1 core per instance of Stata rather than having multiple Stata/MP
> instances competing against each other for the available cores
> on the computer. I would also recommend, as others have pointed
> out in this thread, that if Demian wants to run separate instances
> of Stata, that he take a look at the 'parallel' prefix command
> by George Vega which he presented at the 2013 Stata Conference
> in New Orleans:
>
> http://www.stata.com/meeting/new-orleans13/abstracts/materials/nola13-vega.pdf
>
> Given the number of regressions Demian needs to run (in another
> email he mentioned 100 million regressions), I would recommend that
> he experiment. Assuming he has an 8-core machine, he should run,
> say, 8000 regressions one after another using Stata MP/8. Then, run
> the same 8000 regressions, 1000 on each core in 8 instances of Stata
> using 1 core each. Compare the timing, and decide how to proceed.
>
> As I said above, -regress- is so nearly perfectly parallelized that
> even with a moderate number of observations, a single instance of
> Stata/MP may well be the best way to go, and it is certainly
> easier to set up a single job rather than to run multiple simultaneous
> jobs and combine the results of each.
>
> If Demian has further questions, or would like to discuss the problme
> in more detail, he can email Technical Services at [email protected]
> and we'll be happy to give him advice.
>
>
> Alan
> [email protected]
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
--
Demian T. Panigo
Lic. en Economía, UNLP,
Master en Cs Sociales, UBA,
Doctor en Economía, EHESS-ENS (Paris)
Investigador Adjunto del CEIL-PIETTE del CONICET
Docente investigador de la UNM, de la UNLP, de la UBA, y de
Paris-Jourdan Sciences Economiques-ENS.
Miembro del Programa de Formación Popular en Economía (PROFOPE)
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/