David Ronis <[email protected]> asks:
> My experience with other software is that repeated measures ANOVA will
> either drop cases with any missing data or fail to run when there are
> missing data. Before getting into more complex procedures like SAS PROC
> MIXED I thought I'd give it a try in Stata. My expectation was that the
> failure would help motivate me for the work ahead.
>
> I studied Kenneth Higbee's FAQ a bit at
> www.stata.com/support/faqs/stat/anova2.html
>
> <cut>
>
> To my surprise, the following code ran and gave results that seemed
> reasonable (given my eyeballing the data and means):
>
> clear
> capture log close
> log using preeval2.log, replace
> * Approach from Higbee FAQ
> * http://www.stata.com/support/faqs/stat/anova2.html
> set matsize 800
> set memory 4m
> set more off
> use e:\yeo2\vo2-pree-val.dta
>
> anova vo2 id time / time*id stage / stage*id machine / machine*id /*
> */ time*stage / time*stage*id /*
> */ time*machine / time*machine*id /*
> */ stage*machine / stage*machine*id /*
> */ time*stage*machine / , /*
> */ repeated (time stage machine)
>
> I'm wondering whether this is really an appropriate analysis, and what
> assumptions it / I may be making (especially unusual ones)? For sig test
> results I'm looking at the adjusted ones, not those in the initial ANOVA
> table. It has been about 20 years since I studied ANOVA.
In the interest of brevity I will point you (and others interested
in the subject) back to some statalist threads of long ago.
At the end of July 2001 a similar question was asked. You can
go to
http://groups.yahoo.com/group/statalist/message/25690
to read the message. It quotes from
Milliken & Johnson, 1984, "Analysis of Messy Data, Volume 1:
Designed Experiments", Van Nostrand Reinhold Company, New York.
ISBN: 0-534-02713-7
and has some further discussion. It also points to a statalist
discussion that happened in mid October of 2000. I would point
you to a web link, but the Yahoo site keeps only a certain size
buffer of old messages (currently you can only go back to Nov. of
2000). The archives at Harvard
http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/
appear to only go back to January 2001. And the archives at
http://www.uc.pt/pessoal/ramalheira/stblist.htm
appear to go from 1994 through July 1998.
So, since you might have a hard time finding the discussion from
Oct. 2000, here is what I wrote on 12 Oct 2000 at the conclusion
of the discussion.
-----------------------------------------------------------------------
From: [email protected]
To: [email protected]
Subject: Re: Unbalanced Repeated Measures ANOVA
Al Feiveson <[email protected]> provides a good
caution regarding an example I gave with a significant amount of
missing cells in a repeated measures ANOVA.
> Ken - I see that Stata will produce the ANOVA as you have
> indicated - but how good are the "F" statistics? If I am not
> mistaken, they won't really have an exact "F"-distribution even
> if the error terms are independent normal and homoscedastic. Of
> course, nothing is really normally distributed, etc, anyway, so
> this is probably a moot issue.
With complicated ANOVA designs having missing cells the "proper"
F-tests can be difficult (sometimes impossible) to construct.
In simpler designs where residual error is the only error term,
the missing cells in the design do not change the use of MSerror
for the denominator of the F test.
In more complicated designs where there are different error terms
for various levels of the model, the expected mean squares in the
presence of missing cells can lead to very complicated tests. A
discussion of this can be found in
Milliken and Johnson, 1984, "Analysis of Messy Data, Volume 1:
Designed Experiments", New York: Van Nostrand Reinhold
Company
In particular, around page 395 it shows an example of how you
would form a test, and, as Al Feiveson alludes to, even the test
they contrive does not truely follow an F distribution. They say
concerning the test that it
"... does not have an exact F-distribution since (1) the
statistic in the denominator does not have a distribution
that is proportional to an exact chi-square distribution, and
(2) the numerator and denominator may not be independently
distributed. ..."
So if you have missing cells in a complicated ANOVA design you
will need to exercise caution in interpretting F-tests using non
residual error terms.
In practice, I believe many people close their eyes tight and
proceed with the tests as if the missing cells were not present.
With only a small percentage of the cells missing this may be
reasonable. With a larger percentage of cells missing it may not
be reasonable.
-----------------------------------------------------------------------
Ken Higbee [email protected]
STATA CORP 1-800-STATAPC
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/