Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Memory requirements for factor variables
From
Reifschneider Harry III <[email protected]>
To
[email protected]
Subject
Re: st: Memory requirements for factor variables
Date
Sun, 2 May 2010 21:59:51 -0700
Judging by the code below, it appears that 10m is just not enough. If
you have 8G available on your machine, just increase -set mem- to a
higher level. I use 750m permanently and haven't yet ran into any
shortages.
Cheers!
On May 2, 2010, at 3:29 PM, Partha Deb wrote:
Hi all,
I'm working with a large dataset and am running into the limits of
RAM on my machine (8G). I run into this problem when I try to
create about 500 indicator variables from a set of categorical
variables. If I had only one categorical variable from which to
create the indicators, I would do this directly in my -regress-
command.
regress y x i.D
The example below shows that using -i.varname- is considerably more
memory-efficient as compared to generating the indicators manually
before -regress- , i.e. if one does,
forvalues i=1/100 {
gen byte ID`i' = (D==`i')
}
If I had only one categorical variable to deal with, I would
obviously use -i.varname- . But I need to do something like
forvalues i=1/100 {
gen byte ID`i' = (D1==`i' | D2==`i' | D3==`i' | D4==`i')
}
How I can achieve this in a more memory efficient way? Thanks a
lot. The example do and log are below.
Partha
****** do *******
clear all
set mem 10m
set more off
set seed 123456
set obs 100000
gen x = rnormal()
gen u = rnormal()
gen int d = int(_n/1000)
gen y = x + u
describe,s
qui regress y x i.d
sum d
forvalues i=1/`r(max)' {
gen byte Id`i' = (d==`i')
}
describe,s
regress y x Id*
exit
******* log **********
. clear all
. set mem 10m
Current memory allocation
current memory usage
settable value description (1M = 1024k)
--------------------------------------------------------------------
set maxvar 5000 max. variables allowed 1.909M
set memory 10M max. data space 10.000M
set matsize 400 max. RHS vars in models 1.254M
-----------
13.163M
. set more off
.
. set seed 123456
.
. set obs 100000
obs was 0, now 100000
.
. gen x = rnormal()
. gen u = rnormal()
. gen int d = int(_n/1000)
.
. gen y = x + u
.
. describe,s
Contains data
obs: 100,000 vars:
4 size: 2,200,000 (82.8% of memory free)
Sorted by: Note: dataset has changed since last saved
.
. qui regress y x i.d
.
. sum d
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
d | 100000 49.501 28.86623 0 100
.
. forvalues i=1/`r(max)' {
2. gen byte Id`i' = (d==`i')
3. }
no room to add more variables because of width
An attempt was made to add a variable that would have increased
the memory required to store
an observation beyond what is currently possible. You have the
following alternatives:
1. Store existing variables more efficiently; see help compress.
2. Drop some variables or observations; see help drop. (Think
of Stata's data area as the
area of a rectangle; Stata can trade off width and length.)
3. Increase the amount of memory allocated to the data area
using the set memory command;
see help memory.
r(902);
--
Partha Deb
Professor of Economics
Hunter College
ph: (212) 772-5435
fax: (212) 772-5398
http://urban.hunter.cuny.edu/~deb/
Emancipate yourselves from mental slavery
None but ourselves can free our minds.
- Bob Marley
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/