Shourun Guo <[email protected]> noticed some speed differences between Stata 7 and
Stata 9. Nick Cox <[email protected]> replied some of the issues that Guo
mentioned. I'll address the substantial differences related to Guo's do-file
example:
> When I ran the following ado file on the above dataset in STATA 9 and
> STATA 7, STATA 9 is always much slower. The dataset has about 700,000 obs.
> There is a categary variable called 'group', which is continuous from 1 to
> 6250. Whith which group, there are 80-127 observations. (Different groups
> may have different number of observations). For each group, I need to run a
> regression and record the estimation coefficients. I use a loop to do the
> job. In the loop, I avoided to use -if group=`i'- because it seems -if- cost
> more time than -in- to identify the desired observations from my experience
> in STATA 7 when dealing with large dataset. Basically, I first determine the
> beginning obs and ending obs for each group and then run the regression in
> the loop using -in- condition.
>
> I did some experiments. If I keep 1000 groups, STATA 7 used 17 seconds to
> finish while STATA 9 used 54 seconds. With 3000 groups, STATA 7 used 144
> seconds while STATA 9 used 471 seconds. With all 6250 groups, STATA 7 used
> about 18 minutes, while STATA 9 used about 110 minutes. All the experiments
> are done on the same computer and without other program running. The results
> don't make sense to me. The speed shouldn't be so slow for Verison 9. It
> seems that I need to optimize my program for STATA 9. Any thoughts or
> suggestions?
>
>
> set more off
> set mem 100m
> use ./temp3, clear
> sort group
> by group: gen obsnum=_N
> by group: keep if _n==1
> keep group obsnum
> sum group
> local max=r(max)
>
> forval i=1/`max' {
> local n`i'=obsnum[`i']
> }
>
> use ./temp3, clear
> sort group
> tempname result1
> postfile `result1' id alpha beta using .\rep_beta_anndate, replace
> local base=0
>
> forval i=1/`max' {
> local first=`base'+1
> local last=`base'+`n`i''
> quietly regress ret vwretd in `first'/`last'
> post `result1' (`i') (_b[_cons]) (_b[vwretd])
> local base=`base'+`n`i''
> }
> postclose `result1'
We looked into why, in this case, -regress- is so much slower in Stata 9
compared to earlier Stata releases.
The short answer:
It turns out that there are two unnecessary sortpreserves performed for each
call to -regress- in Stata 9. We will fix this in the next ado-file update,
but in the mean time Guo can use the undocumented -_regress- command (which is
the renamed version of the originally internal -regress- command).
The long answer:
In Stata 9, the -vce(bootstrap)- and -vce(jackknife)- options were added to a
large number of Stata's estimation commands. To facilitate this for
-regress-, the internal -regress- command was renamed to -_regress- so that an
ado-file could handle the -vce()- option (among other new features) and
call through to -_regress-. The -sortpreserve- option was used in the program
definition for -regress- in regress.ado, but it is unnecessary since -sort- is
never directly called by -regress-. There was a second unnecessary
-sortpreserve- that occurs when -regress- calls the "undocumented" routine
that parses the -vce(bootstrap)- and -vce(jackknife)- options.
Using a simulated dataset similar to the one Guo describes above, we have
determined that the fixed -regress- command in Stata 9 will be nearly as fast
as in the previous Stata releases (there is a minuscule amount of overhead due
to -regress- being an ado-file).
-----------------------------------------------------------------------------
Note that Guo's code is equivalent to the following in Stata 9
(-statsby- uses the -in- restriction too)
. use temp3, clear
. statsby alpha=_b[_cons] beta=_b[vwretd],
by(group) save(rep_beta_anndate, replace) :
regress ret vwretd
(the call to -statsby- is a single line, but was broken up for aesthetics)
For faster results (while waiting for the next ado-file update) Guo can use
the following in Stata 9
. statsby alpha=_b[_cons] beta=_b[vwretd],
by(group) save(rep_beta_anndate, replace) :
_regress ret vwretd
--Jeff --Vince
[email protected] [email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/