[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: -egen- and mata (follow up)

From	Antoine Terracol <[email protected]>
To	[email protected]
Subject	st: -egen- and mata (follow up)
Date	Wed, 25 Feb 2009 15:27:00 +0100

Dear _all,

there has been a thread a few months ago about the relative speed of 
-egen- and mata. See 
http://www.stata.com/statalist/archive/2008-07/msg00550.html and Bill 
Gould's reply: http://www.stata.com/statalist/archive/2008-07/msg00582.html

I did similar tests before discovering that thread, and I wanted to add 
a few comments in case someone would be interested.

The apparent superiority of -egen- appears to be due to the efficiency 
of -bysort-. Indeed, the mata equivalent to -egen total- runs faster, 
but not the equivalent to -by id : egen total-.

More importantly (I think), even if mata will be slower than -by id : 
egen total- for a single calculation, it will be faster if one wants to 
compute the total sums of several variables at once because mata can 
calculate those at once, while you would need to run several -egen- 
commands.

He is a summary of the timings (in seconds) on my machine:

panel dataset, 1500 ids*300 periods

-----------------------------------------
Total sum of a single variable, no -by-

mata: 0.1090
egen: 0.7660
-----------------------------------------

-----------------------------------------
Total sum of a single variable, with -by-

mata: 0.7340
egen: 0.5780
-----------------------------------------

-----------------------------------------
Total sums of 5 variables, no -by-

mata: 0.4850
egen: 3.8900
-----------------------------------------

-----------------------------------------
Total sums of 5 variables, with -by-

mata: 1.0310
egen: 2.8900
-----------------------------------------


Somewhat surprisingly, the mata equivalent to -egen min- is even faster 
compared to -egen-, and is in fact always faster, even with -by- and a 
single variable:


-----------------------------------------
Min of a single variable, no -by-

mata: 0.1090
egen: 2.8750
-----------------------------------------

-----------------------------------------
Min of a single variable, with -by-

mata: 0.7500
egen: 2.6560
-----------------------------------------

-----------------------------------------
Mins of 5 variables, no -by-

mata: 0.4850
egen: 14.6250
-----------------------------------------

-----------------------------------------
Mins of 5 variables, with -by-

mata: 1.0620
egen: 13.6410
-----------------------------------------





Here are the mata codes I used, I do not claim they are the most 
efficient one could think of...


/*-----total sum, no by------*/
mata:
void somme(string vector in , string vector out ){
st_view(x, ., (tokens(in)))
sx=J(rows(x),1,colsum(x))
idx = st_addvar("float", (tokens(out)))
idx
st_store(. , idx , sx)
}
end
/*-----------------------------*/






/*------total sum, with by------*/
mata:
void sommeby(string scalar p , string vector in , string vector out ){
st_view(id, ., p)
V=panelsetup(id, 1)
st_view(x, ., (tokens(in)))
sx=J(rows(x),cols(x),.)
for (i=1; i<=rows(V); i++) {
                         panelsubview(X, x, i, V)
      				  sx[V[i,1]::V[i,2],.]=J(rows(X),1,colsum(X))
					}

idx = st_addvar("float", (tokens(out)))
st_store(. , idx , sx)
}
end
/*-----------------------------*/






/*---------minimum, no by---------*/
mata:
void mmin(string vector in , string vector out ){
st_view(x, ., (tokens(in)))
sx=J(rows(x),1,colmin(x))
idx = st_addvar("float", (tokens(out)))
idx
st_store(. , idx , sx)
}
end
/*---------------------------*/



/*----------minimum, with by----------*/
mata:
void mminby(string scalar p , string vector in , string vector out ){
st_view(id, ., p)
V=panelsetup(id, 1)
st_view(x, ., (tokens(in)))
sx=J(rows(x),cols(x),.)
for (i=1; i<=rows(V); i++) {
                         panelsubview(X, x, i, V)
      				  sx[V[i,1]::V[i,2],.]=J(rows(X),1,colmin(X))
					}

idx = st_addvar("float", (tokens(out)))
st_store(. , idx , sx)
}
end
/*----------------------------------*/



Best,

Antoine

-- 
Ce message a ete verifie par MailScanner
pour des virus ou des polluriels et rien de
suspect n'a ete trouve.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: RE: Identifying coherent periods of events with irregular reoccurrence from a time sequence
Next by Date: st: RE: Identifying coherent periods of events with irregular reoccurrence from a time sequence
Previous by thread: st: Identifying coherent periods of events with irregular reoccurrence from a time sequence
Next by thread: st: var-covar matrix from logistic regression
Index(es):
- Date
- Thread