Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Anova and Contrasts with missing cells


From   "Joseph Coveney" <[email protected]>
To   "Statalist" <[email protected]>
Subject   RE: st: Anova and Contrasts with missing cells
Date   Fri, 31 Oct 2008 00:57:53 +0900

Thomas J. Steichen wrote (excerpted and with a few replies interwoven):

. . . both SAS and JMP interpreted a highly similar method of specifying the
contrast as being what I intended. One can debate whether my intention was a
reasonable intention but, assuming it was, Stata tested something else.

--------------------------------------------------------------------------------

JC:

I'm not sure what your SAS contrast statement looks like, but it's possible
that you were just tripped up by a difference in syntax between SAS and
Stata.

--------------------------------------------------------------------------------

TJS:

My question from my first post, "Which is right?", still stands. Maybe a
better question is, What do the two contrasts, the one I used and the one
Joe proposes, really say?

--------------------------------------------------------------------------------

JC:

I have more specifics below on what the two contrasts really say.  In
general, I find that it helps to examine the parameterization and then
define the contrast. It makes it easier assure that you're testing the
hypothesis that you intend to.  As Bill Gould mentioned in the post you
cited, both Stata and SAS allow you to see their parameterizations, and so
you can formulate your contrast in view of the parameterization the package
is using.

--------------------------------------------------------------------------------

TJS:

Interstingly, Joe use the phrase "correct contrast", but I think he really
only meant the contrast that tests what appears to be what I intended.

--------------------------------------------------------------------------------

JC:

That's correct, er, right.  I ought to have written more precisely; sorry.
I do believe, though, that what I showed is the most reasonable contrast to
make if you're interested in testing a difference between Round 1 and
Round 3, and it seemed to be the one you intended.  It amounts to
interpolating (predicting, as Bill Gould wrote in the posts you cite) the
Size-600 value for Round 3.  The contrast that you specified tests the
difference between the Round 1 / Size 600 cell mean and the
Round 3 / Size 800 cell mean.

--------------------------------------------------------------------------------

TJS:

My actual point in quoting the above revolves around the choice of
specifying the model as -anova nnn round size- versus -anova nn round
size|round- and the resulting impact on testing. As Joe says, the ANOVA
estimates are the same (well, he doesn't say exactly that, but I think that
is what he means by "equivalent"), however, it appears there is no way to
specify the contrast based on nested model symbolism. That is, I was unable
to find a way to symbolicly specify anything about -size|round- using that
notation in either -test- or -lincom-. Alternatively, after the nested model
(or the crossed model), one can specify:

 test _b[round[1]] =  _b[round[3]] +  _b[size[1]*round[3]] / 2

Or

 mat test13 = (0,    1, 0, -1, 0, 0,   -.5, 0, 0, 0, 0, 0)
 test, test(test13)

And the test will be performed. The first of these clearly resorts to
crossed-model notation (even in the nested setup!).

My question: Is there a way to directly specify a nested term in -test-
or -lincom- using nested notation (i.e., of the form: a|b)? If there is, I
haven't found it.

--------------------------------------------------------------------------------

JC:

You're correct about what I meant by equivalence.

I'm not sure, though, that I follow what you're saying about the contrast
resorting to crossed-factors notation:  _b[size[1]*round[3]] _is_ a
nested-factor term.  Nested factors are nothing special:  -anova nnn round
size|round- is essentially -anova nnn round size*round-, that is, an
interaction term without including the nested factor among the main effects.

And there's nothing really different between nested factors and crossed
factors in using -test- or -lincom-:  you can't to my knowledge use either
a|b or a*b notation, per se, for testing simple effects or constructing
custom contrasts of individual cell means.

--------------------------------------------------------------------------------

TJS:

Clearly, in this model, SAS and JMP test a contrast on -round- of the form
(1, 0, -1, 0, 0) differently than Stata does. That implicitly implies that
the packages assume different things to make it testable. In Stata notation,
SAS and JMP test contrast matrix (0,  1,0,-1,0,0,  -.5,0,0,0,0,0) while
Stata tests (0,  1,0,-1,0,0,  0,0,0,0,0,0) (i.e., exactly what I
specified!). It also implies that the interpretation of those tests depend
on those assumptions. I don't know what to say about that other than to be
cautious!

--------------------------------------------------------------------------------

JC:

Again, I don't know what you did for the contrast in SAS, but the difference
might be just a syntax difference in the way that Stata and SAS have you
specify the same contrast.  Overall, Stata seems to me as at least as easy
as SAS for forming custom contrasts after ANOVA or other estimation
commands:  in the SAS program below, although the ESTIMATE statement,
itself, is simple enough in appearance, the documentation for it isn't.  In
addition, looking at the SAS output, it appears that SAS translated the
ESTIMATE statement into "twice ROUND 1 versus (the equivalent of) twice
ROUND 3", which seems odd and roundabout.  (It's too voluminous to post
here, but I can e-mail the SAS output for the code below privately.)

I agree about having to be cautious.  For example, apparently, you _must_
use the nested-factor specification in order for SAS to fit an ANOVA model
properly with your dataset.  If you fail to do this, you get junk:  the
ANOVA table from first PROC GLM below has 3 DF for ROUND + 1 DF for SIZE,
which don't add up to the 5 DF stated for the Model.*  I don't know what
ROUNDs and SIZEs are, but from looking at the dataset they don't strike me
as being naturally hierarchal or in a nested relationship (strictly one
within the other), at least in the manner that Winer liked to illustrate,**
and so I wouldn't have considered specifying SIZE(ROUND) off the top of my
head.

I'm guessing that SAS's so-called TYPE IV sums of squares would be no
easier, at least for me, to work with with here.  In general, I've found
that setting up sensible contrasts of interest after fitting a cell-means
model (Milliken & Johnson, cited last time) in Stata is relatively
straightforward under these kinds of circumstances.

Joseph Coveney

* Is it common for SAS to do stuff like this?  I don't recall ever having
seen Stata blithely produce an otherwise normal-looking ANOVA table except
that the degrees of freedom don't add up.  Has anyone run across an example?

** B. J. Winer, D. R. Brown & K. M. Michels, _Statistical Principles in
Experimental Design_  Third Edition.  (New York:  McGraw-Hill, 1991),
pp. 358-65; 456-60; 502-4.

DATA TJS (DROP = NNN_ADJM);
INPUT ROUND SIZE NNN NNN_ADJM;
CARDS;
[dataset snipped--copy & paste from original post]
;
RUN;
PROC PRINT DATA = TJS;
RUN;
PROC GLM DATA = TJS;
   CLASS ROUND SIZE;
   MODEL NNN = ROUND SIZE / E3 SS3 SOLUTION;
RUN;
PROC GLM DATA = TJS;
   CLASS ROUND SIZE;
   MODEL NNN = ROUND SIZE(ROUND) / E3 SS3 SOLUTION;
   ESTIMATE 'ROUND 1 VERSUS ROUND 3' ROUND 1 0 -1 / E;
RUN;


clear *
set more off
input round size nnn nnn_adjm
[dataset snipped--see original post]
end
drop nnn_adjm

// Nonnested-factor specification
anova nnn round size, class(round size)
test round size, symbolic
anova , regress
lincom _b[round[1]] - _b[round[3]] - _b[size[1]] / 2

// Nested-factor specification
anova nnn round size|round, class(round size)
test round size|round, symbolic
anova , regress
lincom _b[round[1]] - _b[round[3]] - _b[size[1]*round[3]] / 2

// Alternative syntax for nested-factor specification
anova nnn round size*round, class(round size)
test round size*round, symbolic
anova , regress
lincom _b[round[1]] - _b[round[3]] - _b[size[1]*round[3]] / 2

// Cell means model
generate cell = round * 1000 + size
anova nnn cell, category(cell)
test cell, symbolic
anova , regress detail
lincom _b[cell[1]] - ( _b[cell[3]] + _b[cell[4]] ) / 2

exit


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index