[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: egen and Mata

From	Paulo Guimaraes <[email protected]>
To	[email protected]
Subject	st: egen and Mata
Date	Tue, 15 Jul 2008 14:50:02 -0400

Dear all,

Because I am working with large datasets I tried to code in Mata the equivalent to the following Stata command:

egen newvar=mean(var), by(id)

believing that the results would be much faster. I coded two very similar Mata routines, which are listed below.
To my surprise the egen command was faster. One of the Mata routines is amazingly slow.
My questions are:

1) Why is mean_by2() so slow?

2) Is it possible to make the Mata code more efficient/faster than Stata's egen?

Any comments/suggestions are appreciated

Paulo Guimaraes

************************************************************************
************************************************************************
set mem 500m

Current memory allocation

current memory usage
settable value description (1M = 1024k)
--------------------------------------------------------------------
set maxvar 5000 max. variables allowed 1.909M
set memory 500M max. data space 500.000M
set matsize 400 max. RHS vars in models 1.254M
-----------
503.163M

. set obs 100000
obs was 0, now 100000

. set more off

. egen id=seq(), from(1) to(10000) block(1)

. gen y=uniform()

. timer on 1

. egen double x1=mean(y), by(id)

. timer off 1

. timer on 2

. mata: mean_by1("y","id")

. timer off 2

. timer on 3

. mata: mean_by2("y","id")

. timer off 3

. timer list
1: 0.59 / 1 = 0.5940
2: 65.81 / 1 = 65.8130
3: 0.86 / 1 = 0.8590

*******************************************************************************************
*******************************************************************************************

mata:
void function mean_by1(string scalar var, string scalar groupid)
{
real colvector order, means, ym
numeric matrix data, info, X
st_view(id=.,.,groupid)
st_view(y=.,.,var)
order=J(rows(y),1,1)
order=runningsum(order)
data=(order,y,id)
_sort(data,3)
info=panelsetup(data[.,3],1)
means=J(rows(info),1,0)
ym=J(rows(data),1,0)
for (i=1; i<=rows(info); i++) {
X=panelsubmatrix(data[.,2],i,info)
means[i,1]=mean(X)
}
for (i=1; i<=rows(info); i++) {
means[i,1]=mean(panelsubmatrix(data[.,2],i,info))
}
data=(data[.,1],ym,data[.,3])
_sort(data,1)
index1=st_addvar("double","_y1")
st_store((1,rows(y)),index1,data[.,2])
}
end

mata:
void function mean_by2(string scalar var, string scalar groupid)
{
real colvector order, means, ym
numeric matrix data, info, X
st_view(id=.,.,groupid)
st_view(y=.,.,var)
order=J(rows(y),1,1)
order=runningsum(order)
data=(order,y,id)
_sort(data,3)
info=panelsetup(data[.,3],1)
means=J(rows(info),1,0)
ym=J(rows(data),1,0)
for (i=1; i<=rows(info); i++) {
X=data[| info[i,1],2\info[i,2],2|]
means[i,1]=mean(X)
}
for (i=1; i<=rows(info); i++) {
for (j=info[i,1]; j<=info[i,2]; j++) {
ym[j,1]=means[i,1]
}
}
data=(data[.,1],ym,data[.,3])
_sort(data,1)
index1=st_addvar("double","_y2")
st_store((1,rows(y)),index1,data[.,2])
}
end

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: egen and Mata
  - From: "C�lio J�nior" <[email protected]>

Prev by Date: st: -denton- updated
Next by Date: Re: st: egen and Mata
Previous by thread: st: -denton- updated
Next by thread: Re: st: egen and Mata
Index(es):
- Date
- Thread