Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: re: RM ANOVA, was SPSS vs. Stata
From
"Ploutz-Snyder, Robert (JSC-SK)[USRA]" <[email protected]>
To
"[email protected]" <[email protected]>
Subject
st: RE: re: RM ANOVA, was SPSS vs. Stata
Date
Mon, 2 Aug 2010 12:29:26 -0500
" Doesn't SPSS wrap GLM for its RM-ANOVA routines?"
Yes--but with repeated measures designs, SPSS (and SAS, Systat, and BMDP in the old days) use listwise elimination. Stata does not (is there an option in Stata's anova, repeated() code to do so??)
" Can you post an example of what you are talking about, re listwise elimination? I don't have SPSS."
Here's an example of how Stata fails to ignore/eliminate listwise for a fixed-factorial Repeated Measures ANOVA, compared to SPSS.
IN STATA:
webuse t43
anova y year, repeated(year)
anova score person drug, repeated(drug)
Number of obs = 20 R-squared = 0.9244
Root MSE = 3.06594 Adj R-squared = 0.8803
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Model | 1379 7 197 20.96 0.0000
|
person | 680.8 4 170.2 18.11 0.0001
drug | 698.2 3 232.733333 24.76 0.0000
|
Residual | 112.8 12 9.4
-----------+----------------------------------------------------
Total | 1491.8 19 78.5157895
Between-subjects error term: person
Levels: 5 (4 df)
Lowest b.s.e. variable: person
Repeated variable: drug
Huynh-Feldt epsilon = 1.0789
*Huynh-Feldt epsilon reset to 1.0000
Greenhouse-Geisser epsilon = 0.6049
Box's conservative epsilon = 0.3333
------------ Prob > F ------------
Source | df F Regular H-F G-G Box
-----------+----------------------------------------------------
drug | 3 24.76 0.0000 0.0000 0.0006 0.0076
Residual | 12
----------------------------------------------------------------
IN SPSS:
Tests of Within-Subjects Effects
Measure:MEASURE_1
Source Type III Sum of Squares df Mean Square F Sig.
drug Sphericity Assumed 698.200 3 232.733 24.759 .000
Greenhouse-Geisser 698.200 1.815 384.763 24.759 .001
Huynh-Feldt 698.200 3.000 232.733 24.759 .000
Lower-bound 698.200 1.000 698.200 24.759 .008
Error(drug) Sphericity Assume 112.800 12 9.400
Greenhouse-Geisser 112.800 7.258 15.540
Huynh-Feldt 112.800 12.000 9.400
Lower-bound 112.800 4.000 28.200
So Stata and SPSS agree on the Repeated Measures F-statistic on Drug--because there is no missing data in this dataset. However, if we eliminate an observation here and there for a couple of subjects, SPSS and Stata fail to agree because
Stata does not eliminate or ignore cases listwise.
For example IN STATA (using same dataset, but eliminating a couple of obs):
replace score = . in 1 /* eliminated person 1's score for drug 1 */
replace score = . in 10 /* eliminated person 3's score for drug 2 */
anova score person drug, repeated(drug)
Number of obs = 18 R-squared = 0.9414
Root MSE = 2.9068 Adj R-squared = 0.9004
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Model | 1357.28267 7 193.897525 22.95 0.0000
|
person | 653.704895 4 163.426224 19.34 0.0001
drug | 702.504895 3 234.168298 27.71 0.0000
|
Residual | 84.4951049 10 8.44951049
-----------+----------------------------------------------------
Total | 1441.77778 17 84.8104575
Between-subjects error term: person
Levels: 5 (4 df)
Lowest b.s.e. variable: person
Repeated variable: drug
Huynh-Feldt epsilon = 0.5297
Greenhouse-Geisser epsilon = 0.4228
Box's conservative epsilon = 0.3333
------------ Prob > F ------------
Source | df F Regular H-F G-G Box
-----------+----------------------------------------------------
drug | 3 27.71 0.0000 0.0019 0.0047 0.0102
Residual | 10
----------------------------------------------------------------
NOTE that Stata is still using data from all subjects (levels = 5).
IN SPSS (same dataset):
Tests of Within-Subjects Effects
Source Type III Sum of Squares df Mean Square F Sig.
drug Sphericity Assumed 478.333 3 159.444 13.932 .004
Greenhouse-Geisser 478.333 1.268 377.157 13.932 .044
Huynh-Feldt 478.333 2.466 193.938 13.932 .008
Lower-bound 478.333 1.000 478.333 13.932 .065
Error(drug) Sphericity Assume 68.667 6 11.444
Greenhouse-Geisser 68.667 2.537 27.071
Huynh-Feldt 68.667 4.933 13.920
Lower-bound 68.667 2.000 34.333
So in this admittedly simple example, SPSS revealed F(3,6) = 13.932, p~.004, whereas Stata shows F = 27.71, which is larger than the original analysis with no missing data.
Of course, with a sample size this tiny, we wouldn't trust either analysis. The point is that the prevailing wisdom for fixed-factorial repeated measures ANOVA is to use listwise elimination, and Stata doesn't do this. (And you get the same Stata results if you use the anova command without the repeated option but instead define the error terms manually--a process that is itself painful enough to avoid entirely if you have 2 or 3 factors, especially if more than 1 are repeated.)
I appreciate that it is possible to "manually" tell Stata to ignore listwise those subjects who are missing any data... However this can get more complicated when there is more than 1 repeated measures factor (example, drugs a b c, measured pre and post). And... exactly what is Stata's analysis "by default" anyway? I could not write that up as a standard repeated measures ANOVA because it isn't that. To me, a straightforward improvement to Stata's -anova- would be to force it to ignore any subjects who are missing any repeated measures observations. That alone would be useful.
Rob
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Airey, David C
Sent: Monday, August 02, 2010 10:55 AM
To: [email protected]
Subject: st: re: RM ANOVA, was SPSS vs. Stata
.
> What SPSS still maintains over Stata is better ANOVA routines,
> particularly Repeated-Measures fixed-factor designs. Stata treats RM
> designs a bit strangely, I believe because it seems to "wrap" ANOVA code
> around Regression methods. It's non-intuitive and can provide results
> that aren't typical of RM ANOVA (consider how it uses full-n for
> fixed-factor RM ANOVA without listwise elimination of subjects who are
> missing an observation). I would much prefer to see Stata invest in
> re-working their ANOVA code and analyses so that it is more consistant
> with SAS or SPSS methodologies, offers more in terms of assumption
> testing (ex. Sphericity tests), and is more intuitive.
Michael Mitchell pointed this out in his head to head to head comparison of Stata, SPSS, and SAS some years ago in a report posted at ATS UCLA.
I don't know if this is true anymore with version 11.1 of xtmixed and the margins functionality. This book shows use of xtmixed in designed experiments:
<http://www-personal.umich.edu/~bwest/almmussp.html>
BTW, you can test sphericity in Stata directly with the mvtest command or by asking for the univariate rm-anova corrections when you use the "repeated(varlist)" option to anova.
Doesn't SPSS wrap GLM for its RM-ANOVA routines?
Can you post an example of what you are talking about, re listwise elimination? I don't have SPSS.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/