Scott Merryman <[email protected]> wrote,
> In the June 2004 issue of the American Economic Review, the back
> cover has an ad from Stata emphasizing the graphics of Stata 8. One
> of the graphs shows a scatter plot with a regression line and
> confidence interval densities. It looks something like the graph on
> page 2 of
>
> http://www.asft.ttu.edu/ansc5403/lecture25.pdf
>
> How does one include the confidence densities in a regression line graph?
This graph superimposes vertical density line plots for the distribution of
the disturbances on a regression line. Such graphs are sometimes seen in
textbooks when trying to provide intuition for linear regression. For data
analysis, the confidence intervals shown by -twoway lfitci y x- are easier to
read, but the graph from the ad has its own appeal. Here is the code used to
produce that graph,
---------------------------------- BEGIN --- regline_ci.do --- CUT HERE -------
clear
sysuse auto
keep if foreign
sort weight
gen weight2 = weight^2
regress mpg weight weight2
predict fit
predict se , stdp
#delimit ;
twoway sc mpg weight , pstyle(p3) ms(o) ||
fn weight[3] - 1000 * normden(x, `=fit[3]' , `=se[3]') ,
range(`=fit[3] -5' `=fit[3] +5') horiz pstyle(p1) ||
fn `=fit[3]' , range(`=weight[3]' `=weight[3]-1000*normden(0, se[3])')
pstyle(p1) ||
fn weight[17] - 1000 * normden(x, `=fit[17]', `=se[17]') ,
range(`=fit[17]-5' `=fit[17]+5') horiz pstyle(p1) ||
fn `=fit[17]', range(`=weight[17]' `=weight[17]-1000*normden(0, se[17])')
pstyle(p1) ||
fn weight[21] - 1000 * normden(x, `=fit[21]' , `=se[21]') ,
range(`=fit[21] -7' `=fit[21] +7') horiz pstyle(p1) ||
fn `=fit[21]', range(`=weight[21]' `=weight[21]-1000*normden(0, se[21])')
pstyle(p1) ||
line fit weight
, clwidth(*2) legend(off) ytitle(Miles per gallon) xtitle(Weight)
title("Scatter with Regression Line and Confidence Interval Densities"
, size(4.8) margin(t=0 b=1.5) span)
;
#delimit cr
---------------------------------- END --- regline_ci.do --- CUT HERE -------
The graph is cute in that the CI densities are not notional, but rather the
actual CIs from our regression of -mpg- on -weight- and -weight- squared. We
have pulled the SE estimates from the regression fit, SEs obtained with
-predict se , stdp-, at observations 3, 17, and 21 and supplied those to the
-fn- (or -function-) plots using the -normden()- function to get our CI lines
(we cheated ever so slightly and did not use a t-distribution). Note that we
scale the result of -normden()- by 1000 so that it looks about right on the
scale of the weight axis -- a scale that runs from 1,500 to 3,500. We need to
do this because the X-axis is not scaled as a density. Our choice of 1000 as
the scaling is arbitrary -- we can only compare the relative heights of the CI
densities on this graph. We also took some care to get an appropriate range
in the -mpg- dimension for each of our CI densities.
The other three -fn- plots just draw the drop lines from the top of the CI
densities to the regression line.
-- Vince
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/