|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: -expand-, -expandcl-, and -set mem-; limit to the number of obs?
From |
"Eric A. Booth" <[email protected]> |
To |
[email protected] |
Subject |
Re: st: RE: -expand-, -expandcl-, and -set mem-; limit to the number of obs? |
Date |
Sun, 11 Oct 2009 18:48:37 -0500 |
Misha wrote:
Sometimes I can only set the memory to 16g
(if I ask for more I get the "op. sys. refuses to provide memory"
message); sometimes I can get only 32g; and sometimes I can get 100g.
What could be the problem?
It sounds like you've got Stata set to use virtual memory (-set
virtual on-) which is why you are seeing the different results from -
set mem-. As Martin's link to my posting indicates, you can 'step up'
your memory to help get the most out of it, but your system will still
limit what is allocated based on how much of your physical RAM and
your virtual memory swap/page file space (e.g., hard drive or mounted
drive space) is available. So, my guess is that you get 32GB, instead
of 100GB, when your machine resources are being used and aren't
available for virtual memory . Check your Task Manager in Windows or
your Activity Manager in Mac OS (or type "top -c " in *nix) while you
are trying to open the dataset. How much physical, hardwired RAM do
you have on your machine?
Misha wrote:
"Why am I asking for so much memory?", you might ask. Well, I have a
data set that, when expanded, ought to give me about 2.63e+09 (i.e.,
nearly three billion) observations.
How large is the .dta file you are using (not in observations, but in
terms of disk space)?
Keep in mind that the "size" of your dataset is more than just the
number of observations; so, how many variables are in the dataset?
how many characters/digits are in your variables? are there labels,
notes, or other characteristics stored in your .dta file? All of
these will contribute to the amount of memory needed open the file in
Stata (this is also why it is difficult to do a simple "back-of-the-
envelope" calculation of the exact amount of memory you need). For
example, look at the difference in the size (% memory free) for these
two examples where the number of variables is increased by only one:
**********
set virtual on
*
clear all
set mem 1g
set obs 1000000
desc
gen i = 1
desc
*
clear all
set mem 1g
set obs 1000000
gen str244 i = "a"
gen str244 i2 = "a"
desc
**********
Compressing (-compress-)the dataset or recasting (-recast-) the
variables can help if the dataset is near the memory limit, but if it
is that large, you should probably consider only using the variables
you need during each step of the analysis, by specifying the varlist
in the -use- command, or breaking your dataset up into smaller chunks
if that's possible.
Eric
__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
[email protected]
Office: +979.845.6754
On Oct 11, 2009, at 5:39 AM, Martin Weiss wrote:
<>
Look at http://www.stata.com/statalist/archive/2009-07/msg00899.html
and http://www.stata.com/support/faqs/win/winmemory.html
HTH
Martin
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Misha
Spisok
Sent: Sonntag, 11. Oktober 2009 11:51
To: [email protected]
Subject: st: -expand-, -expandcl-, and -set mem-; limit to the
number of
obs?
Hello, Statalist!
I have a few questions about Stata's ability to handle billions of
observations.
On the Stata webpage, "Which Stata is right for me," it indicates that
the number of observations is unlimited for Stata versions other than
Small Stata.
The network computer I'm using has Stata 11.0 SE and claims to have
113,000MB of RAM available. At one point I managed to set the memory
to 100g. However, on subsequent tries (after logging out and logging
in), I get mixed results. Sometimes I can only set the memory to 16g
(if I ask for more I get the "op. sys. refuses to provide memory"
message); sometimes I can get only 32g; and sometimes I can get 100g.
What could be the problem?
"Why am I asking for so much memory?", you might ask. Well, I have a
data set that, when expanded, ought to give me about 2.63e+09 (i.e.,
nearly three billion) observations. Whether I use -expand- or
-expandcl- I run into the same problem, getting the message "no room
to add more observations, etc." I've compressed and dropped as much
as I can, but I still get this problem. Am I asking too much of Stata
and/or "only" 113,000MB of RAM? Is there a back-of-the-envelope way
to calculate how much RAM I would need to hold a given dataset?
Thank you for your time and attention.
Misha
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/