Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Memory requirements for factor variables
From
Federico Belotti <[email protected]>
To
[email protected]
Subject
Re: st: Memory requirements for factor variables
Date
Mon, 3 May 2010 10:54:30 +0200
Partha,
I think there is no way to do that in stata. An alternative could be mata. Clearly, you have to write down the ado for your econometric model. An example using OLS is below.
HTH
Federico
****** do *******
clear all
set mem 10m
set more off
set seed 123456
set obs 100000
mata
real matrix factor_reg(rows,cols,d1,d2,d3,d4,x,y) {
D = J(rows,cols,0)
for(i=1;i<=cols;i++) {
for(j=1;j<=rows;j++) {
if (d1[j]==i | d2[j]==i | d3[j]==i | d4[j]==i) D[j,i]=1
}
}
X = x,D,J(100000,1,1)
Y = y
beta = invsym(X'X)*(X'Y)
beta
}
end
gen x = rnormal()
gen u = rnormal()
gen int d = int(_n/1000)
gen int d1 = int(_n/1100)
gen int d2 = int(_n/1200)
gen int d3 = int(_n/1300)
gen int d4 = int(_n/1400)
sum
gen y = x + u
describe,s
regress y x i.d
sum d
tomata
mata: factor_reg(100000,100,d1,d2,d3,d4,x,y)
forvalues i=1/`r(max)' {
gen byte Id`i' = (d1==`i' | d2==`i' | d3==`i' | d4==`i')
}
describe,s
regress y x Id*
exit
--
Federico Belotti
Faculty of Economics
Department of Financial and Quantitative Economics
University of Rome Tor Vergata
tel: +39 06 7259 5624
e-mail: [email protected]
url: http://www.econometrics.it
On 3 May 2010, at 00:29, Partha Deb wrote:
> Hi all,
>
> I'm working with a large dataset and am running into the limits of RAM on my machine (8G). I run into this problem when I try to create about 500 indicator variables from a set of categorical variables. If I had only one categorical variable from which to create the indicators, I would do this directly in my -regress- command.
>
> regress y x i.D
>
> The example below shows that using -i.varname- is considerably more memory-efficient as compared to generating the indicators manually before -regress- , i.e. if one does,
>
> forvalues i=1/100 {
> gen byte ID`i' = (D==`i')
> }
>
> If I had only one categorical variable to deal with, I would obviously use -i.varname- . But I need to do something like
>
> forvalues i=1/100 {
> gen byte ID`i' = (D1==`i' | D2==`i' | D3==`i' | D4==`i')
> }
>
> How I can achieve this in a more memory efficient way? Thanks a lot. The example do and log are below.
>
> Partha
>
> ****** do *******
> clear all
> set mem 10m
> set more off
>
> set seed 123456
>
> set obs 100000
>
> gen x = rnormal()
> gen u = rnormal()
> gen int d = int(_n/1000)
>
> gen y = x + u
>
> describe,s
>
> qui regress y x i.d
>
> sum d
>
> forvalues i=1/`r(max)' {
> gen byte Id`i' = (d==`i')
> }
>
> describe,s
>
> regress y x Id*
>
> exit
>
>
> ******* log **********
>
> . clear all
>
> . set mem 10m
>
> Current memory allocation
>
> current memory usage
> settable value description (1M = 1024k)
> --------------------------------------------------------------------
> set maxvar 5000 max. variables allowed 1.909M
> set memory 10M max. data space 10.000M
> set matsize 400 max. RHS vars in models 1.254M
> -----------
> 13.163M
>
> . set more off
>
> .
> . set seed 123456
>
> .
> . set obs 100000
> obs was 0, now 100000
>
> .
> . gen x = rnormal()
>
> . gen u = rnormal()
>
> . gen int d = int(_n/1000)
>
> .
> . gen y = x + u
>
> .
> . describe,s
>
> Contains data
> obs: 100,000 vars: 4 size: 2,200,000 (82.8% of memory free)
> Sorted by: Note: dataset has changed since last saved
>
> .
> . qui regress y x i.d
>
> .
> . sum d
>
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> d | 100000 49.501 28.86623 0 100
>
> .
> . forvalues i=1/`r(max)' {
> 2. gen byte Id`i' = (d==`i')
> 3. }
> no room to add more variables because of width
> An attempt was made to add a variable that would have increased the memory required to store
> an observation beyond what is currently possible. You have the following alternatives:
>
> 1. Store existing variables more efficiently; see help compress.
>
> 2. Drop some variables or observations; see help drop. (Think of Stata's data area as the
> area of a rectangle; Stata can trade off width and length.)
>
> 3. Increase the amount of memory allocated to the data area using the set memory command;
> see help memory.
> r(902);
>
>
> --
> Partha Deb
> Professor of Economics
> Hunter College
> ph: (212) 772-5435
> fax: (212) 772-5398
> http://urban.hunter.cuny.edu/~deb/
>
> Emancipate yourselves from mental slavery
> None but ourselves can free our minds.
> - Bob Marley
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/