-----Messaggio originale-----
Brendan wrote:
I am working with a dataset containing 30000 observations. Some of
the explanatory variables are continuous. If I perform usual tests
for normality the numbers are too great for swilk or for sfrancia,
and if I use sktest the result is "absurdly" large values and rejects
the hypothesis of normal distribution. The frequency histogram,
cumulative frequency plot and normal plot all look normal with no
outliers. I presume that with such large numbers even very small
deviations from normal will lead to a significant result. The box-
tidwell test indicates that the model relationship is linear for all
these continuous variables. Is it safe to ignore the sktest results?
Regards
Brendan
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
Dear Brendan,
I think you are right.
Quoting Svend Juul's (really helpful) textbook "An Introduction to Stata for
Health Researchers" Stata Press, 2006: 110. "...Significant test for
normality may, however, be misleading: With large datasets, even unimportant
departures from normality becomes statistical significant, and the most
important tool is visual inspection".
HTH and Best Regards,
Carlo
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/