Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: Efficient parallel computing in Stata/MP


From   Demian Panigo <[email protected]>
To   [email protected]
Subject   Re: st: RE: Efficient parallel computing in Stata/MP
Date   Fri, 27 Sep 2013 11:52:13 -0300

Thank Stas.... I agree.
It appears that database splitting (the key of Stata/MP parallelism)
is not very usefull for small sample-size problem, while multiple
stata instances running in parallel are the optimal choice to my new
command.Your point about time spended in OS tasks (e.g. hard drive
interactions) is very interesting. SSD could bring some help.
I'll do some experiments to obtain precise figures for different options.
Thank you again
Demian



2013/9/27 Stas Kolenikov <[email protected]>:
> You would want to break this into as many single process threads as
> you can simply for stability purposes: if for whatever reason a thread
> crashes, you will lose a few dozen hours in a single thread-high MP
> mode, but you will only lose single digit hours in many Stata SE
> instances mode. You will hardly see much speed gain from MP with your
> tiny data, and most of your computing time is going to be spent on
> -post-ing your results -- i.e., the CPU waiting for the hard drive.
> This is not a Stata task, this is an OS task, and you can keep other
> cores busy with running your -xtreg- in another instance of Stata.
>
> -- Stas Kolenikov, PhD, PStat (ASA, SSC)
> -- Senior Survey Statistician, Abt SRBI
> -- Opinions stated in this email are mine only, and do not reflect the
> position of my employer
> -- http://stas.kolenikov.name
>
>
>
> On Fri, Sep 27, 2013 at 7:12 AM, Demian Panigo <[email protected]> wrote:
>> Excuse me... one more point.
>> When I say many regressions..... many is about one hundred million.
>> Thanks again
>> Demian
>>
>> 2013/9/27 Demian Panigo <[email protected]>:
>>> Thank you very much Daniel:
>>> Just one more question.
>>> You finally used 24 cores (cores and hypercores) to run 3 parallel
>>> Stata MP/8 jobs with interesting time saving outcomes.
>>> But, did you compared these results with those obtained by just run 24
>>> parallel Stata SE jobs in, for example, batch mode?
>>> In other words, if my problem has a lot parallelizable tasks (e.g
>>> many independent linear regressions) and they must be performed on a
>>> small database (e.g. 50 variables with 1000 observations each) using
>>> an 8-core CPU (in my University there are more powerfull servers but
>>> not always available), should I rely on a single Stata/MP8 instance, 2
>>> Stata/MP4 parallel instances (with a proper rewritten code) or 8
>>> Stata/SE instances?
>>> Which is better?
>>> Thanks in advance
>>> Demian
>>>
>>>
>>>
>>> 2013/9/27 Daniel Feenberg <[email protected]>:
>>>>
>>>>> -----Original Message-----
>>>>> From: [email protected]
>>>>> [mailto:[email protected]] On Behalf Of Demian Panigo
>>>>> Sent: 27 September 2013 01:02
>>>>> To: [email protected]
>>>>> Subject: st: Efficient parallel computing in Stata/MP
>>>>>
>>>>> Dear Statalist members: I need some help, because I'm not sure about
>>>>> some Stata/MP properties for parallel computing.
>>>>> We know from http://www.stata.com/statamp/statamp.pdf that many
>>>>> estimation commands (e.g. regress) are almost fully parallelizable and
>>>>> that average efficiency for all commands is around 72%. So, in
>>>>> standard linear regression problems (e.g running one million equations
>>>>> for parameter stability analysis), using Stata/MP in a multiple-core
>>>>> CPU would be an optimal time saving strategy.
>>>>> However, it is also possible to exploit the multi-core CPU environment
>>>>> by working with multiple parallel Stata/MP instances (e.g. using 4
>>>>> parallel Stata/MP instances to run 250.000 linear regressions with
>>>>> each core).
>>>>> My question is simple.... Can I save some time by using this "dual
>>>>> parallelism" methodology? (because parallel computing is
>>>>> authomatically used by Stata/MP to parallelize internal tasks of, for
>>>>> example, regress; and because I also parallelize the whole set of
>>>>> regressions between 4 cores, by means of multiple Stata/MP instances).
>>>>> Thanks in advance
>>>>> *
>>>>
>>>>
>>>> In my experience, Stata/MP fully exploits as many real cores as are
>>>> available, very efficiently for regression commands. If you have hypercores,
>>>> running multiple Stata jobs will exploit those efficiently also. I posted
>>>> the results of a simple experiment at:
>>>>
>>>>   http://www.nber.org/stata/efficient
>>>>
>>>> under heading "Stata/MP".
>>>>
>>>> -parallel.ado- is a very interesting routine. It will start up multiple
>>>> Stata processes and let each one read a part of the dataset, then combine
>>>> the results into a single dataset. For processes that are single-threaded
>>>> for no good reason, or if you don't have Stata/MP, it seems like a great
>>>> idea. I believe it will also work well with hyper-cores, but I have no
>>>> experience with it. But for I/O it would just make things worse, since each
>>>> thread has to read the entire dataset.
>>>>
>>>> See
>>>>
>>>>   http://www.stata.com/statamp/report.pdf
>>>>
>>>> for a more discouraging report on hyper-cores. I don't have an explanation
>>>> for the difference in experiences. There is no substitute for
>>>> experimentation on your actual hardware, and there would be interest on this
>>>> list in your experience.
>>>>
>>>> Daniel Feenberg
>>>> NBER
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>>
>>>
>>> --
>>> Demian T. Panigo
>>> Lic. en Economía, UNLP,
>>> Master en Cs Sociales, UBA,
>>> Doctor en Economía, EHESS-ENS (Paris)
>>> Investigador Adjunto del CEIL-PIETTE del CONICET
>>> Docente investigador de la UNM, de la UNLP, de la UBA, y de
>>> Paris-Jourdan Sciences Economiques-ENS.
>>> Miembro del Programa de Formación Popular en Economía (PROFOPE)
>>
>>
>>
>> --
>> Demian T. Panigo
>> Lic. en Economía, UNLP,
>> Master en Cs Sociales, UBA,
>> Doctor en Economía, EHESS-ENS (Paris)
>> Investigador Adjunto del CEIL-PIETTE del CONICET
>> Docente investigador de la UNM, de la UNLP, de la UBA, y de
>> Paris-Jourdan Sciences Economiques-ENS.
>> Miembro del Programa de Formación Popular en Economía (PROFOPE)
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/



-- 
Demian T. Panigo
Lic. en Economía, UNLP,
Master en Cs Sociales, UBA,
Doctor en Economía, EHESS-ENS (Paris)
Investigador Adjunto del CEIL-PIETTE del CONICET
Docente investigador de la UNM, de la UNLP, de la UBA, y de
Paris-Jourdan Sciences Economiques-ENS.
Miembro del Programa de Formación Popular en Economía (PROFOPE)

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index