Maarten buis wrote:
I'm going to add a few "yes, but" nitpicks to the solid advice Maarten wrote:
>1) Regression never assumes that the dependent variable is
normally distributed, except when you have no explanatory
variables. It only assumes that the residuals are normally
distributed. <
Well yes in a sense but note that least squares can be thought of as a procedure for *finding* a vector of residuals that is most normal so the model you fit could be really crummy
Example:
Run OLS regression on the real model
Y_ij ~ Binomial(n,p_ij)
logit(p_ij) = a + b1*x1 + b2*x2
where a = 0, b1 = 1.5, b2 = 1.5 for binary x1, x2, i.e., a 2x2 layout.
You'll need an interaction term between x1 and x2 to get an adequate fit.
>2) Testing for the normality of the residuals should only
be done once you are confinced that the other assumptions
have been met, as violations of the other assumptions are
likely to lead to residuals that look non-normal<
Of which skewness is the most important, hands down, and one wouldn't be considering lognormal distributions without the presence of skew.
>3) The normality of the residuals is probably the least
important of the regression assumptions, as regression
is reasonably robust to violations of it.<
Depends on the violations in question!
>4) Tests are probably not the best way to assess whether
the errors are normaly distributed. Graphical inspection
is usually more informative and powerful, see:
-help diagnostic plots- and -ssc d hangroot- for tools
to help with that.<
Absolutely.
JV
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/