Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: MP running no faster than IC
From
Ted Player <[email protected]>
To
[email protected]
Subject
Re: st: MP running no faster than IC
Date
Mon, 8 Jul 2013 22:09:28 -0600
The benchmark tests I originally described were conducted on a local
machine. I did a follow-up with an EC2 machine (as described
elsewhere in this thread).
I see now that buried on p. 231 of Stata's MP performance report is
the mention that to get the improvements that Stata claims for
regression requires a single regression model with 180 regressors and
a dataset with 1,500,000 observations. I usually do things like
bootstrap analyses on datasets with 500 observations, so I guess MP
isn't any more useful to me than SE.
It looks like I fell for the advertising hype on
http://www.stata.com/statamp . It's my fault for thinking Stata
wouldn't overclaim to make their software seem better than it really
is. Live and learn I guess!
On Mon, Jul 8, 2013 at 9:42 PM, Sergiy Radyakin <[email protected]> wrote:
> Dear Ted, I've witnessed many times that MP works much faster the IC.
> The figures in the report do make sense. No looking at your example:
> the only parallelizable part here is the "regress mpg weight gear
> foreign." Two things to notice immediately are the following:
>
> 1) the dataset contains 74 observations. The overhead of parallelizing
> it into 12 CPUs or even 4 CPUs is large relative to the size of the
> task at hand. You are likely to see the benefits of parallelization
> when you -expand- your dataset, say 1000000 (10^6) times and perhaps
> reduce the number of bootstrap iterations.
>
> 2) the dataset contains 74 observations. So the _regress command
> (internal) takes, say, 0.00001second and with parallelization takes
> may be 0.000001 second, but then you have 2 seconds of writing the
> output to the screen and scrolling the output window. That is not
> parallelized (correct me if I am wrong), though scrolling seems to
> work much faster in recent versions (THANKS!) So, try disabling the
> output with -quietly- and you will see more performance gain from MP.
>
> 3) finally, Stata's ado files seem to not be parallelizable (you don't
> write them that way), but only internal commands are. There have been
> some changes in the most recent versions and the idea is to permit the
> users to write parallel code. I am yet to see these facilities, but it
> makes no sense to test parallelization benefits on do/ado code or
> where such code executes for a significant amount of time. This is
> also a reason while there is no need to separately benchmark bootstrap
> commands.
>
> To summarize the above, try the following commands on LARGE datasets
> (occupy e.g. half of your memory with data):
> mlogit - you should see performance increase about 3 times on a 12
> CPUs MP vs 1CPU IC.
> summarize - you should see about 11-fold performance increase on a
> 12CPUs MP vs 1CPU IC
>
> Run tests on a local machine. Perhaps it's the Amazon that is to blame
> (I don't mean it). Some hosters limit your TOTAL computing power, so
> you can get 128 cores with the same total performance as 1 core. Then
> you are better of with a single CPU license of course :)
>
> Hope this helps.
> Best, Sergiy Radyakin
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/