Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Best machine to build for running STATA
From
Michael Norman Mitchell <[email protected]>
To
[email protected]
Subject
Re: st: Best machine to build for running STATA
Date
Mon, 22 Feb 2010 17:56:49 -0800
Dear Dana
I am glad that you found this information helpful... here is more
information about your queries...
On 2010-02-22 4.50 PM, Dana Chandler wrote:
Hi Michael -
This is extremely helpful. I really appreciate the link to the report
and the estimation of how much memory a dataset will take up.
I have a few follow-up questions:
Re: Memory allocation... Christopher Baum mentions that for a 13 GB
data set, 24GB of RAM would be recommended. Are there any rules of
thumb people use in terms of how much memory a system should have to
comfortably do analysis on a dataset of a given size? Also, how much
memory should be allocated to a dataset. If you have a 250mb dataset,
is allocating 1gig overkill, can this be harmful?
As described at http://www.stata.com/support/faqs/win/pcreqs.html, Stata
recommends "50% more memory than the size of your largest dataset". This
is because Stata needs contiguous memory, and sometimes the operating
system chops up available memory into blocks that are not necessarily
contiguous. You may have seen this for yourself, where you may have 1.8
gigabytes of memory available (according to Windows), but Stata can only
allocate 1.0 gigabytes. A related issue, raised by others in threads
today, is the benefit of a 64 bit OS for being able to access more memory.
I think allocating more memory than you need is generally harmless...
the only exception would be if you were able to allocate so much memory
that Windows started to need to use virtual memory instead of real
memory. In such a case, then your machine will slow down to a complete
crawl as windows furiously grinds the hard drive for virtual memory.
I am not an expert in these matters either. My feeling is that hard
drive speed is a rather trivial issue with regards to Stata performance
since the data files are read into and processed in memory. Memory speed
may be more critical. I think whatever makes memory fast for other tasks
(like gaming or databases) would also make that memory fast for Stata.
In other words, if you find reviews saying xyz memory is really fast for
general computer applications, it is likely it could be useful for
Stata. How useful, I could not say.
Re: hardware... I'm not an expert on different kinds of RAM and hard
drives. Does anyone have any experience with what types of RAM (SRAM
vs. DRAM vs. ??) or hard drives (SCSI vs. SATA or ATA) might work best
with STATA. What about ways to optimize the page file or use virtual
memory?
Re: the report... it is mentioned that they chose a "problem size"
that was relatively large to run all the simulations that measured
speed of X processors vs. 1. I may have missed it, but do they ever
mention if the ratio of the gains or the "percentage parallelization"
stays constant as the problem size grows? I frequently encounter
problem sizes larger than those stated and would like to know if the
percentage parallelizations will remain about the same.
This is a very good question. I think that would be a question for Stata
tech support for them to speculate whether they expect the pattern of
results to generalize as the problem size grows. My expectation is that
the results would hold steady as the size of the problem grew since the
parts that are made parallel are likely the parts that are most
computationally intensive, and hence would form the bulk of the work for
large problem sizes. My experience, for example, with "xtreg" was that
even for pretty large problems, that I realized the same kinds of
performance gains (or slightly higher) than shown in the report. I
believe that report is the most comprehensive benchmark I have ever seen
showing the gains in performance for additional processors for any
software, and certainly for any statistical software.
In short, I think that you could obtain relatively small gains in
performance with the fastest hard drive in the world, and relatively
small gains in performance based on memory speed. My belief is that the
two factors that will dominate your speed will be having sufficient
memory (so you do not ever use "virtual memory") and having Stata/MP to
gain the benefits of the multiple processors. Those two factors, I
believe, can improve your performance by a factor of three (up to three
times faster with four processors than one). By contrast, I think the
gains with respect to hard drives and memory would be measured on the
order of ten to twenty percent.
I hope that is helpful, and would love others to weigh in with their
experience, especially differences of opinion.
Michael N. Mitchell
See the Stata tidbit of the week at...
http://www.MichaelNormanMitchell.com
Visit me on Facebook at...
http://www.facebook.com/MichaelNormanMitchell
On Mon, Feb 22, 2010 at 4:56 PM, Michael Norman Mitchell
<[email protected]> wrote:
Greetings
Two factors come to my mind as being very important...
1) Having sufficient memory. This has been discussed today on the
statalist, with links to how you can calculate your memory needs.
2) Whether you will be using Stata/MP, and how many cores you want to get
(both for your Stata/MP license and physical cores). For large statistical
models, you can save considerable time running models on four cores, for
example. This link contains a detailed report showing the time savings one
gets using Stata/MP, and how much time savings you obtain for each
additional core you add for each command.
http://www.stata.com/statamp/report.pdf
These are not the only factors, but I feel they are among the major
factors.
I hope this helps,
Michael N. Mitchell
See the Stata tidbit of the week at...
http://www.MichaelNormanMitchell.com
Visit me on Facebook at...
http://www.facebook.com/MichaelNormanMitchell
On 2010-02-22 2.27 PM, Dana Chandler wrote:
Hi fellow Statalisters -
I was wondering if anyone has any suggestions or guidelines for what
would be the ideal type of machine to build for intensive STATA-use.
In particular, if you wanted to be able to run saturated regression
models on large (several gigabyte) data sets in STATA, what would the
ideal set up be? This is a computer that will be used exclusively for
data-intensive tasks and mostly with STATA.
The only requirement is that it has to be built on a windows x86
operating system. What type of hardware makes for the speediest STATA
experience: harddrive type, RAM type, number of processors, etc. ?
Thanks in advance,
Dana
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/