Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Machine spec for 70GB data, Summary

From	Yuval Arbel <[email protected]>
To	[email protected]
Subject	Re: st: Machine spec for 70GB data, Summary
Date	Mon, 24 Oct 2011 10:39:11 +0200

Gindo,

I would advise you to try the available operating system first. What
have you got to loose? in my case and to my suprise it worked fine on
a laptop including the regression procedures. Indeed it takes  a
couple of minutes until the software reads the data, but the bottom
line is that if you have a reasonable computer it might work, and then
you don't need to spend time to study other platforms

Yuval

On Mon, Oct 24, 2011 at 10:11 AM, Gindo Tampubolon
<[email protected]> wrote:
> Dear all,
>
> Thanks for all the informative and prompt reply, in particular to Yuval, Jeroen, Billy, Dan, Joerg, Buzz. It seems worthwhile to explore other ways/platforms for doing this stuff.
> Gindo
>
> ----------------------------------------------------------------------
> Jeroen wrote:
> I read your question on using stata to fit large cross-classified models -- on a 70Gb dataset.
> I am afraid the performance  is very problematic. While I use Stata for most of my work
> fittings mixed models in Stata is somehow problematic -- too inefficient. Recently, a tool
> has become avaibel to fit mixed models (included models with crossed REs) in Mlwin from
> within Stata (search for runmlwin) -- the performance difference is staggering.
> ----------------------------------------------------------------------
> Yuval wrote:
> Are you sure the data file is 70GB? I'm using Windows operating system
> and I recently succeded to run a file of  1.29 GB that includes above
> 4 million observations. Here are the few raws from the do file. Just
> make sure to use the "set memory" command:
> ----------------------------------------------------------------------
> Billy wrote:
> Contrary to prior responses to your request, the set memory command is unnecessary when using Stata 12.  If your dataset is 70GB, you would need at least that much RAM in addition to the RAM necessary for your computer to run.
> ----------------------------------------------------------------------
> Dan wrote:
> Once you have the 64-bit versions the operating system and Stata Linux v
> Windows won't make much difference, but you really need to establish how
> much memory you will need. Machines that offer more than 24GB of memory
> are much more expensive than smaller machines so you can save quite a bit
> if you can limit your maximum "set memory" to 18 GB or so.
>
> If you are able to read a subset of the data into a machine you already
> have, that can give you an idea of how much memory you will need for the
> full dataset. You say "a few million observations" but unless "few" means
> thousands you should be able to get by with far less than 70GB of memory.
> You don't say how many variables, or how many are float or int. If you
> have 250 ints, you can store nearly a million observations per GB. Stata
> doesn't need much more memory than that which is used for the data.
>
> I have posted some suggestions for working with large datasets in Stata at
>   http://www.nber.org/sys-admin/large-stata-datasets.html
>
> the main point of which is that if you separate the sample selection from
> the analysis steps, it is possible to work with very large datasets in
> reasonable core sizes (if the analysis is only on a subset, of course).
>
> There is some information on the Stata website:
>   http://www.stata.com/support/faqs/win/winmemory.html
>   http://www.stata.com/support/faqs/data/dataset.html
>
> It is possible to get computers with up to 256 GB of memory for
> reasonable prices (for some definitions of reasonable, such as
> $US25,000) and that can be convinient. It probably isn't necessary,
> though.
> ----------------------------------------------------------------------
> Joerg wrote:
> What are "a few millions"? If by that you mean like a handful then you
> must have a ton of variables. If you do not need all of them for your
> analyses, you can read the data in in chunks, set up the variables you
> need, and eventually put it together again. However, in my experience
> it seems difficult to fit more complicated multilevel models in Stata
> when sample size becomes large. I find this to be especially true in
> the case of models with crossed random effects. So just beware, even
> if you get all the data you want into memory, you may not be able to
> run the model you propose.
> ----------------------------------------------------------------------
> Buzz wrote:
> I concur with Joerg Luedicke's statalist response to your question.  My
> experience is similar to his in that large complicated multilevel models may
> be extremely time consuming to fit.
>
> See http://www.stata.com/statalist/archive/2010-09/msg00424.html
> which indicates one problem, although cluster robust SE are available in
> - -xtmixed- for Stata 12.
>
> Also, there will be little advantage of MP Processing for -xtmixed-
> See page 33 at http://www.stata.com/statamp/statamp.pdf
> ----------------------------------------------------------------------
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



-- 
Dr. Yuval Arbel
School of Business
Carmel Academic Center
4 Shaar Palmer Street, Haifa, Israel
e-mail: [email protected]

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- Re: st: Machine spec for 70GB data, Summary
  - From: Gindo Tampubolon <[email protected]>

Prev by Date: Re: st: R: variable not found?
Next by Date: Re: st: variable not found?
Previous by thread: Re: st: Machine spec for 70GB data, Summary
Next by thread: st: Update to -stjm- on SSC
Index(es):
- Date
- Thread