Another big difference between the overparameterized and the cell
means approach is the size of the underlying design matrix (the
X'X matrix). In a cell means approach the X'X matrix is smaller
(often much smaller) and of full rank -- no columns/rows need to
be dropped. In the overparameterized model the X'X matrix has
redundancies built in that end up getting dropped out. That is
why I was commenting on comparing the degrees of freedom versus
the number of columns used in the X'X matrix for that particular
example. The D*C*B*G|A term has 8 d.f.s but used 72 columns in
the X'X matrix (all but 8 of which end up getting dropped due to
collinearity with other terms in the model).
Consider an anova with factors A (with 3 levels) and B (with 4
levels)
| B
| 1 2 3 4
------+-----------
1 | 1 2 3 4
A 2 | 5 6 7 8
3 | 9 10 11 12
There are 12 cells in this layout.
The overparameterized model that most people are familiar with
would be run by typing (assuming y is the dependent variable):
. anova y A B A*B
The design matrix and d.f.s would be
term # of cols in X'X df
------------------------------
_cons 1 1
A 3 2
B 4 3
A*B 12 6
------------------------------
total 20 12
There are 8 (= 20-12) columns/rows dropped due to collinearity.
The cell means ANOVA approach is
. tab A B, gen(cells)
. anova y cells, noconstant
This is just a oneway anova on the 12 cells that make up A and B.
The F-test for A B and A*B are not automatically provided, but
can be obtained using -test- with the -accum- option. Individual
degree-of-freedom tests, however, are easy to think about and
form.
>> With your particular case it doesn't look like you can get a
>> S|A*B term (I am assuming A is crossed with B). You say A has 20
>> levels and B has 2 and that there are 400 animals total. Since
>> 20*2 = 400, I guess that means you have one animal per a A*B
>> combination. So you will not be able to estimate a S|A*B term
>> separate from the A*B term. Maybe you will drop the A*B term
>> (and assume that the A*B interaction is insignificant).
>
> Factor A is isogenic strain (all animals genetically the same within
> strain like twins or clones, but animals different between the 20
> strains). Factor B is sex. I have 10 animals per sex, both sexes per
> strain, so I should be able to get the term S|A*B, since I have 10
> animals per A*B combination. 20*2 = 40 A*B levels, I have 400 animals,
> so 10 per combination.
Oops. In my message I said "20*2 = 400" -- duh! You are fine --
as you say, you have 10 animals per A*B combo -- i.e., 20*2*10 = 400.
> Factors C, D, E, F, are drug treatment, test session period, stimulus
> character 1, stimulus character 2.
>
>> I commend the idea of creating an example dataset and doing a dry
>> run of your analysis before collecting the data. This is helpful
>> in complicated designs to help point out limitations or problems
>> you might run into. In some cases it might set you back to
>> rethinking how you want to design your experiment.
>
> In my case I'm fairly limited in being able to obtain 10 animals per
> sex per strain. Too expensive otherwise. So a within subject design
> seems necessary in some fashion. The only real concern I had, carry
> over effects of drug level (saline<->drugA<->drugB) were not a problem
> in another paper where order was counterbalanced by animal and a rest
> period between the three drug level test days was given. Of course, I
> don't claim to know it is the best design. But I do think dividing up
> the limited number of animals into a between group design will lack
> power.
You are probably doing very well with your design. I was just
pointing out in general that running a proposed analysis on
contrived data can help point out unforseen problems.
I am reminded of my job as a graduate student providing
statistical consulting for graduate students in other scientific
fields who were working on their dissertation or thesis. I
always felt very bad telling someone that they had spent a lot of
time (and possibly money) gathering data that wouldn't answer the
research question they had posed (usually due to confounding).
If they would have popped in for a consultation (usually provided
for free by agreement between the different University
departments) before gathering their data, they would have saved
themselves a lot of time and headaches (and possibly graduated
earlier).
Ken Higbee [email protected]
StataCorp 1-800-STATAPC
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/