Austin Nichols <[email protected]> demonstrates an example that compares
the speed of the -lowess- and -locpoly- commands:
> It may be that the labor-intesive part of the -lowess- command is compiled
> code (_LOWESS is built-in in both Stata 8 and Stata 9), but it does run a
> whole heck of a lot slower than -locpoly- (which is why I assumed it was
> interpreted code, I guess). In this simple example, -lowess- takes more than
> 15 times as long (2 min versus 6 sec).
> clear
> sysuse auto
> replace wei=round(wei/10)
> expand wei
> set rmsg on
> locpoly price mpg, name(locpoly) width(2)
> lowess price mpg, name(lowess) bw(1)
The reason -lowess- is significantly slower than -locpoly- in this example is
due to the number of weighted regressions each command performs. Note that by
default -locpoly- uses min(_N,50) equally spaced smoothing points whereas
-lowess- estimates the smooth at each value of the explanatory variable (mpg
in this example). That is, the number of smoothing points is equal to _N, the
number of observations. In the example above, while -locpoly- performs only
50 weighted regressions, -lowess- runs _N = 22344 of them. It is difficult to
compare the speed of the two commands directly since each is using a different
weighting procedure. However, the following gives a more clear picture.
clear
sysuse auto
replace wei=round(wei/10)
expand wei
keep if _n<1000
set rmsg on
locpoly price mpg, width(1) nograph at(mpg)
lowess price mpg, bw(1) nograph mean
On my computer I got the following results:
. locpoly price mpg, width(1) nograph at(mpg)
r; t=0.53 14:51:13
. lowess price mpg, bw(1) nograph mean
r; t=0.13 14:51:13
We can see now that -lowess- runs faster. Note that by using the option
-at()- we request that -locpoly- evaluate the smooth at each value of the
variable mpg. Therefore, each of the commands now performs the same number of
regressions. Also, by default, -locpoly- performs local mean smoothing. We
can use the option -mean- with -lowess- to request mean smoothing. If graphs
are not needed, you can use -nograph- to save the time required to generate
graphs.
Both commands are using C code to perform regressions and the speed of each
depends heavily on the number of smoothing points. If the dataset is large,
-lowess- will take a long time to run. -locpoly- will run faster unless
-at()- is specified or a large number of smoothing points -n()- is requested.
-- Yulia
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/