Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: MP running no faster than IC
From
Sergiy Radyakin <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: MP running no faster than IC
Date
Mon, 8 Jul 2013 23:42:38 -0400
Dear Ted, I've witnessed many times that MP works much faster the IC.
The figures in the report do make sense. No looking at your example:
the only parallelizable part here is the "regress mpg weight gear
foreign." Two things to notice immediately are the following:
1) the dataset contains 74 observations. The overhead of parallelizing
it into 12 CPUs or even 4 CPUs is large relative to the size of the
task at hand. You are likely to see the benefits of parallelization
when you -expand- your dataset, say 1000000 (10^6) times and perhaps
reduce the number of bootstrap iterations.
2) the dataset contains 74 observations. So the _regress command
(internal) takes, say, 0.00001second and with parallelization takes
may be 0.000001 second, but then you have 2 seconds of writing the
output to the screen and scrolling the output window. That is not
parallelized (correct me if I am wrong), though scrolling seems to
work much faster in recent versions (THANKS!) So, try disabling the
output with -quietly- and you will see more performance gain from MP.
3) finally, Stata's ado files seem to not be parallelizable (you don't
write them that way), but only internal commands are. There have been
some changes in the most recent versions and the idea is to permit the
users to write parallel code. I am yet to see these facilities, but it
makes no sense to test parallelization benefits on do/ado code or
where such code executes for a significant amount of time. This is
also a reason while there is no need to separately benchmark bootstrap
commands.
To summarize the above, try the following commands on LARGE datasets
(occupy e.g. half of your memory with data):
mlogit - you should see performance increase about 3 times on a 12
CPUs MP vs 1CPU IC.
summarize - you should see about 11-fold performance increase on a
12CPUs MP vs 1CPU IC
Run tests on a local machine. Perhaps it's the Amazon that is to blame
(I don't mean it). Some hosters limit your TOTAL computing power, so
you can get 128 cores with the same total performance as 1 core. Then
you are better of with a single CPU license of course :)
Hope this helps.
Best, Sergiy Radyakin
On Mon, Jul 8, 2013 at 9:41 PM, Ted Player <[email protected]> wrote:
> Short version: Stata MP 12-core isn't running my code any faster than
> it did when I used Stata IC, and I can't figure out why.
>
> Detailed version: I am running Windows 7 Pro SP1 64-bit on a
> quad-core machine. I have purchased two Stata licenses. I purchased
> Intercooled when version 12 was released. I recently purchased MP-12
> core to make my Stata code run faster. (I realize I only have four
> cores so the 12 core is overkill; I want the flexibility to use
> Amazon's EC2, so I purchased the 12 core version.) Both flavors of
> Stata are version 12.
>
> Unfortunately, I am finding that MP does not run any faster than IC.
> Indeed, in all my tests MP is a little slower. To document the issue,
> I did a fresh install of Stata Intercooled and then I ran benchmark.do
> (below). I ran it three times, and the average run time was 18.4
> seconds. Then I uninstalled Stata completely, installed Stata MP-12
> core, and ran benchmark.do again. I ran it three times, and the
> average was 19.6 seconds. I'm disappointed that MP isn't running
> faster.
>
> The benchmark program shown below performs a bootstrap of regression.
> According to the Stata/MP Performance Report
> (http://www.stata.com/statamp/report.pdf), replication-based commands
> such as bootstrap were not benchmarked for the report because "these
> commands run another target command repeatedly, and to the extent the
> target command's performance is improved for a particular problem
> size, a similar improvement will be obtained when it is run
> repeatedly" (p. 7). In the benchmark program below, the target
> command is regression (which the report shows to be markedly improved
> for MP). The part of the Stata/MP Performance Report I have quoted
> here seems to suggest I should expect a performance improvement in my
> setup when using bootstrap.
>
> Stata makes positively *glowing* claims about MP (e.g.,
> http://www.stata.com/stata12/stata-mp), but I have yet to find any
> improvement whatsoever.
>
> I have done a creturn list to verify that I have Stata/MP installed
> correctly. The relevant parts of the creturn list are show below:
>
> c(MP) = 1
> c(processors) = 4 (Stata/MP, set processors)
> c(processors_lic) = 12
> c(processors_mach) = 4
> c(processors_max) = 4
> c(os) = "Windows"
> c(osdtl) = "64-bit"
> c(machine_type) = "PC (64-bit x86-64)"
>
>
> I should mention that when I look at the CPU usage with Windows Task
> Manager, it stays at 25% while benchmark.do is running MP-12 core.
> Also, I should mention that under the MP-12 core install, I have tried
> "set processors 1", and I get practically the same performance that I
> get from "set processors 4". It seems to me that MP isn't using the
> extra cores.
>
> Can anyone explain to me why I'm not getting any better performance
> from MP-12 core than I'm getting from IC?
>
>
> benchmark.do
> ----------------------------------------------------------------------------------------------
> clear all
> sysuse auto
> timer on 1
> bootstrap, nodots reps(5000) seed(1): regress mpg weight gear foreign
> timer off 1
> quietly timer list
> local elapsed = r(t1)
> display "This benchmark process required ... `elapsed' ... seconds"
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/