| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: Re: memory problem where over 50% of memory are free
First, you will likely need to allocate more memory when working with a large
and narrow dataset like this.
Second, I would try a more direct approach to the calculations whenever
possible due to memory and speed considerations. The -egen- command may
actually create several variables during its execution, including a temp
variable to hold the result until it's done, a variable to flag the sample to
include in the calculations, a pseudo variable used by Stata for the sorting,
and perhaps even another copy of the original variable (I haven't checked the
max() code, but since it accepts expressions it may create the result of that
expression). Anyway, if you don't have missing values on year2, it would be
much more memory efficient (and faster to execute):
sort persnr year2
by persnr: gen int maxyear2=max(year2)
If you do have missing values on year2, it becomes a little more complicated and
you will need to generate a byte variable to track those observations and issue
a few extra commands:
gen byte touse=year2<,
sort persnr touse year2
by persnr touse: gen int maxyear2=max(year2) if touse
drop touse
by persnr (maxyear2): replace maxyear2=maxyear2[1]
I find that I often need only about twice the required minimum memory to work
with big datasets, but if the datasets are vary narrow, like yours, I often need
triple the required memory because some commands need to add the equivalent of
several more variables while they are executing.
Michael Blasnik.
----- Original Message -----
From: "Stephan Brunow" <[email protected]>
To: <[email protected]>
Sent: Tuesday, April 10, 2007 5:10 AM
Subject: st: memory problem where over 50% of memory are free
Dear Statalisters,
I have a problem concerning the memory storage. There is a quiet large
dataset. If I use just 6 variables,
obs: 21,041,596
vars: 6
size: 336,665,536 (56.8% of memory free)
----------------------------------------------------------------------------
---
storage display value
variable name type format label variable label
----------------------------------------------------------------------------
---
persnr long %12.0g
year1 int %8.0g
month1 byte %8.0g
year2 int %8.0g
month2 byte %8.0g
util int %8.0g
----------------------------------------------------------------------------
---
I set the memory quiet large:
<snip>
At least, over 50% of allowed memory are free. There should be enought place
to generate 2 or 3 integer variables. However, if I do the following I
recieve the error message that there is no room to add a variable due to
width. I can wheter compress the data nor drop variables since it is
compressed and I need these 6 variables.
Here is the command:
. by persnr, sort: egen int maxyear2=max(year2)
What might be the problem, what should I do?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/