Michael Ingre <[email protected]> asks:
> I have tried fitting a repeated measures anova in Stata and I was
> surprisingly disappointed with the performance. My dataset contains 17
> subjects observed 20 times a day during three different days. It is a simple
> two-factor repeated measures ANOVA with a total of 1020 observations.
>
> . anova dv subject day / subject*day time / subject*time day*time
> ,repeated(day time)
>
> I timed it this morning and Stata/SE (8.2) took 7 minutes 30 seconds to
> complete the analyses!!!!
>
> My computer is not the fastest in the world (PowerBook G4, 800Mhz, 640MB
> RAM) but SPSS run the same model in seconds!!! (SPSS report 2 seconds
> processor time but there is some overhead). And my experience from similar
> models in SPSS and StatView (StatView does not calculate epsilon) over the
> last five of years or so, is that it should run in seconds rather than
> minutes even if the model is considerably larger.
I created a dataset based on the information you provided. I ran
your -anova- on my 2.4 GHz computer running Linux. It finished
in just under a minute. I do not know what SPSS and StatView are
doing and so cannot fully explain the differences in timing.
The traditional (standard) approach to ANOVA is called the
"overparameterized ANOVA model". This is the approach used by
the -anova- command in Stata. In this approach, the SSCP
(sums-of-squares and cross-products) matrix is based on the full
set of dummy (also called indicator) variables and their
interactions based on the terms listed in the ANOVA. (We don't
actually create the dummy variables, but the resulting SSCP
matrix is the same as if we did.)
For this particular ANOVA we have a 492 by 492 SSCP matrix. The
492 is based on the following breakdown of columns
Term Columns d.f.s
--------------------------------
The constant 1
subjects 17 16
day 3 2
subj*day (3*17) 51 32
time 20 19
subj*time (17*20) 340 304
day*time (3*20) 60 38
--------------------------------
Total 492 411
The number of degrees of freedom for the model is 411.
Stata uses the matrix sweep operator on the resulting 492 by 492
matrix in order to solve the normal equations. During the sweep
81 of the columns are "swept" from the matrix (set to zero which
indicates that they are dropped), leaving the 411 corresponding
to the degrees of freedom for the model.
When everything is balanced there may be faster ways of getting
to the same answer. But, Stata's -anova-, using the sweep
operator, is able to handle designs that are not balanced
(including having missing cells) and that may have other
collinearities (from continuous variables included in the model).
In those cases, the faster ways of getting to the answer may not
hold.
Many years ago when I encountered SAS in school, (and I am
guessing it is still true) they had a -PROC ANOVA- that required
a balanced design. If you did not have a balanced design you
needed to use -PROC GLM-; and as I understand it, their -PROC
GLM- uses a sweep operation (similar to what Stata uses) to get
at the answer. I would be surprised if the SAS PROC GLM and
Stata -anova- speeds are drastically different.
David Airey <[email protected]> mentioned several
alternatives for repeated measures data including Stata's
-manova- command that was introduced in Stata 8. I personally
like MANOVA over repeated measures ANOVA. (But there are some
cases where the MANOVA cannot be done -- too many y variables
compared to the number of observations -- where the repeated
measures ANOVA can still be computed.)
Ken Higbee [email protected]
StataCorp 1-800-STATAPC
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/