Thank you for the prompt and detailed reply. I agree it doesn't make too
much sense runnibg this test over a variable or a series of variables
without a prior hindsight. I already checked the data "visually", but what
stunned me in this case
the log-normal plot looked really good (as confirmed by swilk test p=0,90
BUT
the histogram was not very convincing, therfore i needed more formal
estimation of normality,
the swilk is a useful tool in my case to confirm normality, my version of
STATa,9 does not print the H0.hypothesis.
Carlo Georges,DVM
-----Original Message-----
From: [email protected]
[mailto:[email protected]]On Behalf Of Nick Cox
Sent: Freitag, 8. August 2008 16:16
To: [email protected]
Subject: st: RE: RE: swilk test Ho:
Similar questions come up from time to time.
I'll recycle some thoughts given previously. I agree strongly with
Martin's bottom line.
Often it appears that normality testing is just part of some statistical
ritual, and that those participating have lost sight of exactly why they
are doing it. But let's put such vague, impious thoughts aside, and look
at some hard evidence.
A salutary example is near to hand.
. sysuse auto, clear
. swilk price-foreign
Shapiro-Wilk W test for normal data
Variable | Obs W V z Prob>z
-------------+-------------------------------------------------
price | 74 0.76696 15.008 5.909 0.00000
mpg | 74 0.94821 3.335 2.627 0.00430
rep78 | 69 0.98191 1.100 0.208 0.41760
headroom | 74 0.98104 1.221 0.436 0.33137
trunk | 74 0.97921 1.339 0.637 0.26215
weight | 74 0.96110 2.505 2.003 0.02258
length | 74 0.97165 1.825 1.313 0.09461
turn | 74 0.97113 1.859 1.353 0.08803
displacement | 74 0.92542 4.803 3.423 0.00031
gear_ratio | 74 0.95814 2.696 2.163 0.01525
foreign | 74 0.96928 1.978 1.488 0.06838
Let's sort that so the structure is easier to see.
price | 74 0.76696 15.008 5.909 0.00000
displacement | 74 0.92542 4.803 3.423 0.00031
mpg | 74 0.94821 3.335 2.627 0.00430
gear_ratio | 74 0.95814 2.696 2.163 0.01525
weight | 74 0.96110 2.505 2.003 0.02258
foreign | 74 0.96928 1.978 1.488 0.06838
turn | 74 0.97113 1.859 1.353 0.08803
length | 74 0.97165 1.825 1.313 0.09461
trunk | 74 0.97921 1.339 0.637 0.26215
headroom | 74 0.98104 1.221 0.436 0.33137
rep78 | 69 0.98191 1.100 0.208 0.41760
Stepping back, what is non-normality and why we should care
about it? (For normal, read "Gaussian" or "central" if you prefer.
The second was suggested by the physicist Edwin Jaynes.)
Crudely, non-normality could include overall skewness, overall
tail weight differing from normal, granularity, individual
outliers, and whatever else I've forgotten. Shapiro-Wilk collapses
all that onto one dimension by quantifying the straightness of
a normal probability plot. But, crucially, you lose much information
by any such numerical reduction.
To the key point: How far is any column here an indicator of
non-normality that
you might care about (or normality that you might desire)?
For example, -rep78- is at one extreme of the ranking, but -rep78- is an
ordered categorical variable and in one sense is possibly not
even appropriate for the test. It looks good because it happens to be
unimodal, fairly symmetric and free of outliers. Even -foreign- passes
muster,
if you use P < 0.05 as a cutoff, even though it's a binary variable.
But why is -foreign- assessed as more nearly normal than
-gear_ratio-? It's, I guess, because it waggles less in the tails
than -gear_ratio-. Yet I really can't imagine -gear_ratio- causing
any problems as either response or predictor, even if there were
some assumption of normality anywhere. On the other hand, -foreign-
really should not be analysed as if it were normal!
Naturally, some of the results here make perfect sense. On -swilk-
(and for that matter on moment- and L-moment-based shape measures)
-price- sticks out as distinctly skew and fat-tailed and probably
best analysed on (say) a logarithmic scale.
But the total picture is this. You can boost Shapiro-Wilk
as much as you like as an omnibus or portmanteau statistic, but
you can't guarantee that it will match what is acceptable to
you or unacceptable to you. Practically, it can send a very
misleading message.
I haven't touched on various other issues.
A key issue is what happens with different sample sizes. Naturally,
I have no idea what sample sizes occur in Carlo's work.
Perhaps even more important, tests for marginal normality are often not
directly relevant for how a predictor or response behaves within some
larger model.
Nick
[email protected]
Martin Weiss
Well, your H0 is correct. The interpretation of test results is more
intricate, though. Non-rejection of the null does not imply that the
data
are normally distributed; it does mean that you do not find convincing
evidence against the assertion that they derive from a normal
distribution.
Note that the 95% confidence level that you are implying in your post
means
that you will falsely reject the null in 5% of your tests. The
information
that tests such as -swilk- provide is less than most users imagine...
Carlo Georges
In using the shapiro wilk test for testing normality, is it correct that
the
H0 (NULL hypothsis) is :H0 data are normally distributed, so when p<
0,05 we
reject Ho and data are not normally distributed.
Conversely if p> 0,05 data are normally distributed.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/