Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Machine spec for 70GB data
From
Daniel Feenberg <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Machine spec for 70GB data
Date
Sat, 22 Oct 2011 08:47:59 -0400 (EDT)
On Sat, 22 Oct 2011, Gindo Tampubolon wrote:
Dear all,
I need to process a large data file [70GB; a few millions obs] with
Stata 12 MP8. Mainly to do cross-random effects,individuals and
hospitals, where the outcome is length of stay [controlling for no more
than a handful of covariates to begin with]. As an approximation, the
outcome is treated as continuous i.e. linear mixed models.
What kind of machine spec would be needed? Any ideas, information,
experience? Would operating system make any difference? I'm open to
consider Windows, Linux, OS X.
Once you have the 64-bit versions the operating system and Stata Linux v
Windows won't make much difference, but you really need to establish how
much memory you will need. Machines that offer more than 24GB of memory
are much more expensive than smaller machines so you can save quite a bit
if you can limit your maximum "set memory" to 18 GB or so.
If you are able to read a subset of the data into a machine you already
have, that can give you an idea of how much memory you will need for the
full dataset. You say "a few million observations" but unless "few" means
thousands you should be able to get by with far less than 70GB of memory.
You don't say how many variables, or how many are float or int. If you
have 250 ints, you can store nearly a million observations per GB. Stata
doesn't need much more memory than that which is used for the data.
I have posted some suggestions for working with large datasets in Stata at
http://www.nber.org/sys-admin/large-stata-datasets.html
the main point of which is that if you separate the sample selection from
the analysis steps, it is possible to work with very large datasets in
reasonable core sizes (if the analysis is only on a subset, of course).
There is some information on the Stata website:
http://www.stata.com/support/faqs/win/winmemory.html
http://www.stata.com/support/faqs/data/dataset.html
It is possible to get computers with up to 256 GB of memory for
reasonable prices (for some definitions of reasonable, such as
$US25,000) and that can be convinient. It probably isn't necessary,
though.
Dan Feenberg
Many thanks,
Gindo
University of Manchester
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/