I have a question about interpretation of the ksmirnov
statistic in STATA...
I have a dataset that records a variable called
VISPROM for an entire population (approx 190,000
observations), and of that population, I have several
samples such as paleo (n=43), archaic (n=1026),
anasazi (n=5412). I have created dummy variables for
sorting, giving the sample the value of 0 and the
population the value 1, so that when I sort on the
dummy variable, my sample is the first value of, say,
the dummy variable paleo, and my population is the
second value of the dummy variable. Using the
ksmirnov test, to compare the two samples as follows:
ksmirnov visprom, by(paleo)
I get the following result:
. sort paleo
. ksmirnov visprom, by(paleo)
Two-sample Kolmogorov-Smirnov test for equality of
distribution functions:
Smaller group D P-value Corrected
----------------------------------------------
0: 0.0000 1.000
1: -0.1509 0.177
Combined K-S: 0.1509 0.353 0.285
Now, according to my text (Stats in Geography, Ebdon),
my critical value for D with DoF of 43, is .21 at a
significance level of .05. Further, the book says
that a KS D statistic that is GREATER than the
critical value indicates that the Null Hyp that the
distributions are equal can be rejected, and that
there is evidence of a non-random pattern.
I interpret the above STATA output as follows:
The D statistic for paleo camopared with the
population is .1509, which is less than the critical
value of .21 for 43 degrees of freedom. Therefore, I
cannot reject the null hypothesis that the sample and
population distributions are equal at a significance
level of .05.
Does that sound right?
What about the p-value of 1 for H0?
What is the significance of the Corrected values (and
what is the Combined)?
I sure wish there was an annotated output for ksmirnov
on the stata site.
Here is another output that I REALLY don't know what
to do with...
. sort archaic
. ksmirnov visprom, by(archaic)
Two-sample Kolmogorov-Smirnov test for equality of
distribution functions:
Smaller group D P-value Corrected
----------------------------------------------
0: 0.0631 0.000
1: -0.0645 0.000
Combined K-S: 0.0645 0.000 0.000
any help is appreciated
In F,L&T,
Stace D. Maples
School of Social Sciences
Office: GR 3.416
GIS Lab: GR 3.206
University of Texas at Dallas
P.O. Box 830688
Richardson, TX 75083
Phone: (214) 641-0920
[email protected]
www.stacemaples.com
Office & Lab Hours:
Monday 2:00pm - 6:00pm, Saturday 10:00am - noon
GIS Lab: GR 3.206
or by appointment
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/