Michael Ingre replied:
Ken Higbee <[email protected]>:
> I created a dataset based on the information you provided. I ran
> your -anova- on my 2.4 GHz computer running Linux. It finished
> in just under a minute. I do not know what SPSS and StatView are
> doing and so cannot fully explain the differences in timing.
I need to correct my timing a bit. My PowerBook (apparently) did not
feel
very well yesterday. I have run it three times this morning in 3
minutes
29-32 seconds on an iMac G4 800Mhz. That's still however, a 100 times
slower
than SPSS.
On my computer, a 1.25 GHz Powerbook, the timing for this problem with
Michael Ingre's data set was:
r; t=119.92 9:02:08
Most of this was due to the epsilon correction calculations. The
uncorrected ANOVA table was completed in less than 30 seconds (probably
~ 20 s).
Data Desk 6.2 calculated the ANOVA table (using GLM) less than 3
seconds:
Design:
Source F/R max df EMS F-Denom
Const - 1 sbt+Const sbt
sbt R 16 sbt Error
day F 2 sbt*day+day sbt*day
sbt*day M 32 sbt*day Error
tim F 19 sbt*tim+tim sbt*tim
sbt*tim M 304 sbt*tim Error
day*tim F 38 day*tim Error
Error R 608
Total 1019
ANOVA:
Source df SS MS F P
Const 1 17708.3 17708.3 209.28 � 0.0001
sbt 16 1353.83 84.6146 69.225 � 0.0001
day 2 80.6608 40.3304 5.1077 0.0119
sbt*day 32 252.673 7.89602 6.4599 � 0.0001
tim 19 113.157 5.95562 2.5548 0.0005
sbt*tim 304 708.676 2.33117 1.9072 � 0.0001
day*tim 38 53.4961 1.40779 1.1517 0.2487
Error 608 743.171 1.22232
Total 1019 3305.67
There is not a requirement for the data to be balanced using Data Desk
for univariate repeated measures ANOVA; a subject is not completely
dropped because one repeated observation was missing. On the other
hand, Data Desk offers no corrections. Data Desk can calculate repeated
measures design using MANOVA, but only in a limited way, unlike Stata.
Data Desk could not, for example, compute a Ingre's problem using
MANOVA, according to the manual. Stata can.
> When everything is balanced there may be faster ways of getting
> to the same answer. But, Stata's -anova-, using the sweep
> operator, is able to handle designs that are not balanced
> (including having missing cells) and that may have other
> collinearities (from continuous variables included in the model).
> In those cases, the faster ways of getting to the answer may not
> hold.
Yes. That's it. Thank you Ken for making that point. SPSS and StatView
only
accepts cases with complete data on all measurements. In this area
Stata
outperforms the competition.
The ability to analyze unbalanced designs with missing cells is
intriguing
and I can think of many situations where it could be useful. Though,
special
care must be taken, when there are lot's of missing data or when the
pattern
of missing data is systematic.
Given the enormous speed improvement with (presumably) the alternative
way
of calculating ANOVAs, an alternative procedure for anova (for complete
cases data) is high up on my wish list. And I guess also on David
Aireys
(did your anova finish at all?) and others who do experimental
research.
No, the ANOVA did not finish. Or rather the epsilon corrections never
finished. My conclusion was that I should use a different approach
altogether, for two reasons. One is that the ANOVA I discussed online
previously was actually a smaller test version of the one I really need
to run. It turns out that the design matrix limits are too small in
Stata SE. My inadequate understanding is that both Proc Mixed and R LME
use alternative ways of representing matrices during internal
calculations, and are able to compute problems of the size I am
interested in. The second reason is that both Proc Mixed and LME allow
different covariance structures to be modeled, which is more realistic
for repeated measures situations.
> David Airey <[email protected]> mentioned several
> alternatives for repeated measures data including Stata's
> -manova- command that was introduced in Stata 8. I personally
> like MANOVA over repeated measures ANOVA. (But there are some
> cases where the MANOVA cannot be done -- too many y variables
> compared to the number of observations -- where the repeated
> measures ANOVA can still be computed.)
MANOVA is an interesting alternative in many situations. I will
consider it
when appropriate. If I'm not mistaken though, the present analysis
would not
run in MANOVA because it would mean 3*20 dependent variables and only
17
subjects. This is also typical for many of our experiments (and some
of our
field studies) so ANOVA would still be our main approach.
Ouch. Are there no alternatives in any of the xt- commands in Stata?
This is usually where I get frustrated with what I don't
know--(in?)applicability of the xt commands to experimental repeated
measures data typically analyzed by ANOVA/MANOVA or mixed modeling.
David Airey <[email protected]>:
> As for me, the more I use Stata, the more I like it, but the more I
> mess around with statistics, the more tools I wind up exploring (Data
> Desk, Stata, and R, so far).
Agree. Stata is really growing on me. And this is of course part of my
problem. I want Stata to be able to do it all ... I don't want to
spend time
in to many programs but I have realized that there are limits even to
Stata.
Currently though, I have my hands full with learning Stata and LISREL.
And
soon I will take a course in GLLAMM.
> For biologists using statistics, the main weaknesses of Stata are
> currently a lack of a routine like SAS Mixed or R LME/NLME
Mixed modeling is an area that I'm very interested in. I have no
practical
experience of it but from what I've read it is the answer to many of my
problems. And that's why I will take some time to learn GLLAMM which
as I
understand is the closest to Proc Mixed you can get in Stata.
Yes, but another list member has repeatedly stated that GLLAMM has
limited ability for modeling the covariance structure. When you take
that course, report back!
> Please send me the data set if it's not private, and I will run on my
> Powerbook to compare times. I'm curious about this. I have a 1.25 GHz
> Powerbook.
Check you mail.
Finally, many thanks to Ken Higbee and David Airey your time and
knowledge.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/