Maarten has already made what I think is by far the most important
point, that marginal normality (Gaussianity) of predictors is not
an issue.
I want to comment on a detail. Whether a histogram or a cumulative
frequency curve "looks normal" is in my view very difficult to judge
reliably. In the case of a histogram there are decisions over bin
width and bin origin that are necessarily arbitrary. Even if
a Gaussian density or distribution function is superimposed,
as the case may be, comparison is still problematic. More
positively, a normal plot [quantile-quantile plot, presumably]
is customised for this problem and far more useful.
An alternative test for normality is given by -omninorm- on SSC.
I don't use myself much, but it was fun to program.
Brendan
-------
I am working with a dataset containing 30000 observations. Some of the
explanatory variables are continuous. If I perform usual tests for
normality the numbers are too great for swilk or for sfrancia, and if I
use sktest the result is "absurdly" large values and rejects the
hypothesis of normal distribution. The frequency histogram, cumulative
frequency plot and normal plot all look normal with no outliers. I
presume that with such large numbers even very small deviations from
normal will lead to a significant result. The box- tidwell test
indicates that the model relationship is linear for all these continuous
variables. Is it safe to ignore the sktest results?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/