I'm not much into Mata, but I think you should start trying st_data()
instead st_view().
On Tue, Jul 15, 2008 at 3:50 PM, Paulo Guimaraes <[email protected]> wrote:
> Dear all,
>
> Because I am working with large datasets I tried to code in Mata the
> equivalent to the following Stata command:
>
> egen newvar=mean(var), by(id)
>
> believing that the results would be much faster. I coded two very similar
> Mata routines, which are listed below.
> To my surprise the egen command was faster. One of the Mata routines is
> amazingly slow.
> My questions are:
>
> 1) Why is mean_by2() so slow?
>
> 2) Is it possible to make the Mata code more efficient/faster than Stata's
> egen?
>
> Any comments/suggestions are appreciated
>
> Paulo Guimaraes
>
>
>
> ************************************************************************
> ************************************************************************
> set mem 500m
>
> Current memory allocation
>
> current memory usage
> settable value description (1M = 1024k)
> --------------------------------------------------------------------
> set maxvar 5000 max. variables allowed 1.909M
> set memory 500M max. data space 500.000M
> set matsize 400 max. RHS vars in models 1.254M
> -----------
> 503.163M
>
> . set obs 100000
> obs was 0, now 100000
>
> . set more off
>
> . egen id=seq(), from(1) to(10000) block(1)
>
> . gen y=uniform()
>
> . timer on 1
>
> . egen double x1=mean(y), by(id)
>
> . timer off 1
>
> . timer on 2
>
> . mata: mean_by1("y","id")
>
> . timer off 2
>
> . timer on 3
>
> . mata: mean_by2("y","id")
>
> . timer off 3
>
> . timer list
> 1: 0.59 / 1 = 0.5940
> 2: 65.81 / 1 = 65.8130
> 3: 0.86 / 1 = 0.8590
>
> *******************************************************************************************
> *******************************************************************************************
>
> mata:
> void function mean_by1(string scalar var, string scalar groupid)
> {
> real colvector order, means, ym
> numeric matrix data, info, X
> st_view(id=.,.,groupid)
> st_view(y=.,.,var)
> order=J(rows(y),1,1)
> order=runningsum(order)
> data=(order,y,id)
> _sort(data,3)
> info=panelsetup(data[.,3],1)
> means=J(rows(info),1,0)
> ym=J(rows(data),1,0)
> for (i=1; i<=rows(info); i++) {
> X=panelsubmatrix(data[.,2],i,info)
> means[i,1]=mean(X)
> }
> for (i=1; i<=rows(info); i++) {
> means[i,1]=mean(panelsubmatrix(data[.,2],i,info))
> }
> data=(data[.,1],ym,data[.,3])
> _sort(data,1)
> index1=st_addvar("double","_y1")
> st_store((1,rows(y)),index1,data[.,2])
> }
> end
>
> mata:
> void function mean_by2(string scalar var, string scalar groupid)
> {
> real colvector order, means, ym
> numeric matrix data, info, X
> st_view(id=.,.,groupid)
> st_view(y=.,.,var)
> order=J(rows(y),1,1)
> order=runningsum(order)
> data=(order,y,id)
> _sort(data,3)
> info=panelsetup(data[.,3],1)
> means=J(rows(info),1,0)
> ym=J(rows(data),1,0)
> for (i=1; i<=rows(info); i++) {
> X=data[| info[i,1],2\info[i,2],2|]
> means[i,1]=mean(X)
> }
> for (i=1; i<=rows(info); i++) {
> for (j=info[i,1]; j<=info[i,2]; j++) {
> ym[j,1]=means[i,1]
> }
> }
> data=(data[.,1],ym,data[.,3])
> _sort(data,1)
> index1=st_addvar("double","_y2")
> st_store((1,rows(y)),index1,data[.,2])
> }
> end
>
>
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/