|  |  | 
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: Re: memory problem where over 50% of memory are free
First, you will likely need to allocate more memory when working with a large 
and narrow dataset like this.
Second,  I would try a more direct approach to the calculations whenever 
possible due to memory and speed considerations.  The -egen- command may 
actually create several variables during its execution, including a temp 
variable to hold the result until it's done, a variable to flag the sample to 
include in the calculations, a pseudo variable used by Stata for the sorting, 
and perhaps even another copy of the original variable (I haven't checked the 
max() code, but since it accepts expressions it may create the result of that 
expression).  Anyway, if you don't have missing values on year2, it would be 
much more memory efficient (and faster to execute):
sort persnr year2
by persnr: gen int maxyear2=max(year2)
If you do have missing values on year2, it becomes a little more complicated and 
you will need to generate a byte variable to track those observations and issue 
a few extra commands:
gen byte touse=year2<,
sort persnr touse year2
by persnr touse: gen int maxyear2=max(year2) if touse
drop touse
by persnr (maxyear2): replace maxyear2=maxyear2[1]
I find that I often need only about twice the required minimum memory to work 
with big datasets, but if the datasets are vary narrow, like yours, I often need 
triple the required memory because some commands need to add the equivalent of 
several more variables while they are executing.
Michael Blasnik.
----- Original Message ----- 
From: "Stephan Brunow" <[email protected]>
To: <[email protected]>
Sent: Tuesday, April 10, 2007 5:10 AM
Subject: st: memory problem where over 50% of memory are free
Dear Statalisters,
I have a problem concerning the memory storage. There is a quiet large
dataset. If I use just 6 variables,
obs:    21,041,596
vars:             6
size:   336,665,536 (56.8% of memory free)
----------------------------------------------------------------------------
---
             storage  display     value
variable name   type   format      label      variable label
----------------------------------------------------------------------------
---
persnr          long   %12.0g
year1           int    %8.0g
month1          byte   %8.0g
year2           int    %8.0g
month2          byte   %8.0g
util           int    %8.0g
----------------------------------------------------------------------------
---
I set the memory quiet large:
<snip>
At least, over 50% of allowed memory are free. There should be enought place
to generate 2 or 3 integer variables. However, if I do the following I
recieve the error message that there is no room to add a variable due to
width. I can wheter compress the data nor drop variables since it is
compressed and I need these 6 variables.
Here is the command:
. by persnr, sort: egen int maxyear2=max(year2)
What might be the problem, what should I do?
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/