In error I posted my reply directly to Stace yesterday rather than the list and am now
posting it to the list. Apologies,
Neil
> On Sun, 24 Jul 2005 17:58:21 -0700 (PDT) Stace
> Maples <[email protected]> wrote...
>
> > I have a question about interpretation of the
> ksmirnov
> > statistic in STATA...
> > I have a dataset that records a variable called
> > VISPROM for an entire population (approx 190,000
> > observations), and of that population, I have
> several
> > samples such as paleo (n=43), archaic (n=1026),
> > anasazi (n=5412). I have created dummy variables
> for
> > sorting, giving the sample the value of 0 and the
> > population the value 1, so that when I sort on the
> > dummy variable, my sample is the first value of,
> say,
> > the dummy variable paleo, and my population is the
> > second value of the dummy variable. Using the
> > ksmirnov test, to compare the two samples as
> follows:
> >
> > ksmirnov visprom, by(paleo)
> >
> > I get the following result:
> >
> > . sort paleo
> >
> > . ksmirnov visprom, by(paleo)
> >
> > Two-sample Kolmogorov-Smirnov test for equality of
> > distribution functions:
> >
> > Smaller group D P-value Corrected
> > ----------------------------------------------
> > 0: 0.0000 1.000
> > 1: -0.1509 0.177
> > Combined K-S: 0.1509 0.353 0.285
> >
> >
> > Now, according to my text (Stats in Geography,
> Ebdon),
> > my critical value for D with DoF of 43, is .21 at
> a
> > significance level of .05. Further, the book says
> > that a KS D statistic that is GREATER than the
> > critical value indicates that the Null Hyp that
> the
> > distributions are equal can be rejected, and that
> > there is evidence of a non-random pattern.
> >
> > I interpret the above STATA output as follows:
> > The D statistic for paleo camopared with the
> > population is .1509, which is less than the
> critical
> > value of .21 for 43 degrees of freedom.
> Therefore, I
> > cannot reject the null hypothesis that the sample
> and
> > population distributions are equal at a
> significance
> > level of .05.
>
> > Does that sound right?
>
> Your interpretation seems fine to me, but you don't
> need to refer to statisical tables to
> determine the critical value as Stata is calculating
> the p-value for the test you are
> performing (and will if desired calculate the exact
> p-value).
>
> > What about the p-value of 1 for H0?
>
> I think there is some confusion, the 0 and 1 are
> under the column "Smaller group", and
> these are one-way tests to determine if group 0 (the
> group that comes first when
> dividing your data by variable paleo) is smaller
> than the second group. The 1 tests if the
> second group is smaller.
>
> > What is the significance of the Corrected values
> (and
> > what is the Combined)?
> >
>
> Combined is the two-way test and is asking "Is there
> a difference between these two
> distributions" without any regard for which group is
> the smaller/larger.
>
> Details of corrected p-value are given in the manual
> and is "...obtained by modifying the
> asymptotic p-value using a numerical approximation
> technique."
>
> [ Formulae omitted]
>
> > I sure wish there was an annotated output for
> ksmirnov
> > on the stata site.
>
> The annotated output can be found in the manuals [R]
> ksmirnov (pp230-233) (and
> includes a couple of short biogs on Kolmogrov and
> Smirnov). The manuals are an
> invaluable resource and do very often contain
> annotated examples. There have been
> discussions on the list in the past about making the
> printed manuals available in
> electronic format, but for the various reasons
> discussused in these postings they are
> not currently available in this format (search the
> archives if interested).
>
> I found the following invaluable when I first came
> across this group of tests...
>
> Conover WJ (1999) Practical Nonparametric
> Statistics. John Wiley & Sons.
>
> Its a great book and everything I've read in it is
> explained with exceptional clarity.
>
> >
> > Here is another output that I REALLY don't know
> what
> > to do with...
> >
> >
> > . sort archaic
> >
> > . ksmirnov visprom, by(archaic)
> >
> > Two-sample Kolmogorov-Smirnov test for equality of
> > distribution functions:
> >
> > Smaller group D P-value Corrected
> > ----------------------------------------------
> > 0: 0.0631 0.000
> > 1: -0.0645 0.000
> > Combined K-S: 0.0645 0.000 0.000
> >
>
> Hopefully the interpretation of this is now clearer.
>
> HTH's
>
> Neil
>
> P.S. - Its Stata not STATA (see
>
> http://www.stata.com/support/faqs/res/statalist.html#spell)
Neil Shephard
Genetics Statistician
ARC Epidemiology Unit, University of Manchester
[email protected]
[email protected]
"If your result needs a statistician then you should design a better experiment" -
Ernest Rutherford
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/