I was dopey on this. As David Airey pointed out
privately, it is pretty clear that John wants _observed_ means
plotted as well as fitted means.
This can be done quite(*) easily with a little preparation.
Consider
. sysuse auto, clear
. anova mpg rep78 for
Number of obs = 69 R-squared = 0.2825
Root MSE = 5.16246 Adj R-squared = 0.2256
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Model | 661.189524 5 132.237905 4.96 0.0007
|
rep78 | 179.189006 4 44.7972516 1.68 0.1655
foreign | 111.773747 1 111.773747 4.19 0.0447
|
Residual | 1679.01337 63 26.6510059
-----------+----------------------------------------------------
Total | 2340.2029 68 34.4147485
. egen mean = mean(mpg) if e(sample) , by(rep78)
(5 missing values generated)
Crucial detail: the -if e(sample)- can be important when
there are missing values, to ensure that you get comparable results.
It does no harm even if there aren't. This command must follow
the -anova-; otherwise e(sample) either isn't defined
or may be an inappropriate e(sample) left over from another
model.
If you want different means, the handle is the -by()-
option.
. anovaplot , plot(scatter mean rep78 , ms(Dh))
legend(order(2 "Domestic" 3 "Foreign" 4 "observed means"))
In short, the trick is to calculate the means separately
and then use -anovaplot-'s -plot()- option to show
them superimposed. If you think that's clever, it is,
and all your applause should be directed at StataCorp
for inventing it. Stata 9 users should note that
the last version of -anovaplot- I know about was
for Stata 8, so -addplot()-, the same thing but
with a Stata 9 name, does not yet work.
The tricky bit is getting the legend right. This
issue arose recently in a discussion of the user-
written program -glcurve- by P. van Kerm and S. Jenkins.
The best tip is to view the source to see what -anovaplot-
is doing under the {hood | bonnet}.
(*) "quite" in English, meaning British, meaning
"moderately". "quite" in American appears to
mean "extremely". Thus an American speaker at a London
Stata users' meeting who thanked a questioner for
a "quite helpful" comment got some quite puzzled
looks from the audience.
Nick
[email protected]
Nick Cox
> For those not in the know, -anovaplot- is
> a user-written command.
>
> . search anovaplot
>
> points to a write-up:
>
> SJ-4-4 gr0009 . . . . . . . . . . Speaking Stata: Graphing
> model diagnostics
> (help anovaplot, indexplot, modeldiag, ofrtplot, ovfplot,
> qfrplot, racplot, rdplot, regplot, rhetplot, rvfplot2,
> rvlrplot, rvpplot2 if installed)
> Q4/04 SJ 4(4):449--475
> plotting diagnostic information calculated from residuals
> and fitted values from regression models with continuous
> responses
>
> Now on the question, I'm not clear what John wants
> that -anovaplot- does not provide,
> as the main purpose of -anovaplot- is precisely to
> show means according to anova factors.
>
> Thus
>
> . sysuse auto, clear
>
> . anova mpg rep78 foreign
>
> Number of obs = 69
> R-squared = 0.2825
> Root MSE = 5.16246 Adj
> R-squared = 0.2256
>
> Source | Partial SS df MS
> F Prob > F
>
> -----------+----------------------------------------------------
> Model | 661.189524 5 132.237905
> 4.96 0.0007
> |
> rep78 | 179.189006 4 44.7972516
> 1.68 0.1655
> foreign | 111.773747 1 111.773747
> 4.19 0.0447
> |
> Residual | 1679.01337 63 26.6510059
>
> -----------+----------------------------------------------------
> Total | 2340.2029 68 34.4147485
>
> . anovaplot
>
> gives me two parallel segmented lines shows means fitted
> as a function of the factors, plus point symbols for the
> data.
>
> (For some unknown reason, ANOVA people tend to plot just
> means, and not the original data, but the author of -anovaplot-
> evidently does not approve. Any regression person
> showing just a straight line would get told pretty promptly
> to add the data by any competent refereee or boss.)
>
> Nick
> [email protected]
>
> John Novak
>
> > I would like to add a plot of the treatment means collapsed
> > across the by
> > variable to an -anovaplot-. I have done this:
> >
> > #delimit ;
> > quietly anova y a b a*b;
> > anovaplot ,
> > scatter(msymbol(i) xsize(3) ysize(3) name(by_b, replace))
> > plot(mband y a) ;
> > delimit cr
> >
> > It is almost what I want, but adds the median band instead
> of a mean
> > band. Does anyone know how I can accomplish the same effect,
> > but with
> > means instead of medians?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/