Thanks to Kit Baum, various new and revised packages have
been placed on SSC. All are products of a ongoing project
aimed at producing a suite of model diagnostic graphical
routines to complement official Stata's -regdiag-.
Slides from some talks at users group meetings
reflecting different aspects of this project are at
http://www.stata.com/support/meeting/8uk/diag.pdf
http://www.stata.com/support/meeting/8uk/diag.html
(London and Maastricht, May 2002; but graphics
all Stata 7)
http://fmwww.bc.edu/repec/nasug2003/CoxNASUG2003.pdf
(Boston, March 2003; graphics Stata 8)
To install or replace these packages, use -ssc-.
revised: rvfplot2
=================
-rvfplot2- is offered as a generalisation of official
Stata's -rvfplot- for residual vs fitted plots after
fitting regression-type commands. -rvfplot2- has been
rewritten for Stata 8. The previous version, which
was written for Stata 7, remains in the package
as -rvfplot27-.
revised: rvpplot2
=================
A similar story:
-rvpplot2- is offered as a generalisation of official
Stata's -rvpplot- for residual vs predictor plots after
fitting regression-type commands. -rvpplot2- has been
rewritten for Stata 8. The previous version, which
was written for Stata 7, remains in the package
as -rvpplot27-.
new: rhetplot
=============
-rhetplot- (think "residual heteroscedasticity plot")
is offered as a fairly general graphical tool for
checking for heteroscedasticity of errors. Stata 8 is required.
Here is the rhetoric behind -rhetplot-, more discursively
than in the help file.
Homoscedastic errors are commonly assumed in many
model fits; checking residuals for heteroscedasticity
is thus advisable. Graphically this can be done
in various ways, including -rvfplot- (or -rvfplot2-,
above) or -rdplot- (from SSC). -rhetplot- is another
way to do it.
The generic idea is this: divide the data into
subsets, calculate the standard deviation of
residuals in each subset and plot the standard
deviations to see if they are similar or different
(and if different, if there is some collective pattern).
Sometimes the division into subsets is naturally
determined. Suppose you
. webuse systolic
. anova systolic drug disease
Here the question is whether errors have
similar variability in cells defined by
combinations of -drug- and -disease-, and this
is the way to do it:
. rhetplot, by(drug disease)
The graph shown has sd of residuals on the y axis
and the results of -egen <tempvar> = group(drug disease),
label- on the x axis. In this case, to get
the benefit of the labelling, you need to
add
. rhetplot, by(drug disease) xlabel(1/12, valuelabel)
Other times any division into subsets is at
least a little arbitrary. The handles provided
in -rhetplot- are those provided in -egen, cut()-,
namely its -at()- and -group()- options, although
I typically use only the latter.
Suppose I
. sysuse auto
. regress turn length
I might want to slice -length- into
quantile-based groups. To see some detail,
but not too much, use the magical
number seven, plus or minus two
(http://psychclassics.yorku.ca/Miller/):
. rhetplot length, group(7)
Strictly, what appears on the x axis
in this case are the means of groups
of -length-.
You don't need to specify a variable
(and if you have more than one covariate,
there may not be an obvious choice in any
case). Given
. rhetplot, group(7)
the slices are of the fitted values,
producing in essence the same plot in
this case. So this choice is like
taking a residual vs fitted plot,
slicing it vertically and plotting
the sd of residuals in each
slice against the mean of the
fitted values in each slice.
The graph shown in each case
is a call to -lowess-, so unless
the number of groups is very
small, you get -lowess-'s idea
of a smooth. This can be helpful
informally for getting an idea
of the structure of variability.
Suppose you
. insheet using
http://www.kgs.ku.edu/Mathgeo/Books/Stat/ASCII/OCS.TXT,
clear
That's some data which I'll explain by
. label data "petroleum reservoirs, Outer Continental Shelf,
Texas and Louisiana"
. label var mmboe "ultimate production, million barrels oil
equivalent"
. label var area "area of closure, acres"
A simple-minded
. regress mmboe area
followed by
. rhetplot area, g(7)
shows a clear tendency to heteroscedasticity,
and eyeballing suggests that sd of residuals
is approximately proportional to mean of fitted. This is made
clearer by superimposing a line through the origin
. rhetplot area, g(7)
plot(function y = (72/19000) * x, range(0 19000))
As is well known, sd / mean = constant points straight towards
an analysis on the logarithmic scale. Of course,
a little thought or experience with similar size data
might have suggested that in the first place. Anyway,
note that here the -plot()-
option comes free by courtesy of -lowess-.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/