Good question.
-chitesti- and its sibling -chitest- are in a package
-tab_chi- on SSC. The latest public versions of -chitesti-
and -chitest- are 2.0.0, both from July 2003.
The immediate command -chitesti- in fact calls -chitest-
(with a secret handshake indicating keyboard input).
What happens internally is that the observed and
expected frequencies are put in -float- variables.
This is not adequate for your problem to hold all
the digits you want to hold. I make the expected
frequencies
406694.3598 and 29766.6402
which add to 436441 exactly by virtue of 0.9318 +
0.0682 equalling 1. However, putting them in a -float-
and then getting the total yields
436461.015625. Of course everything is done in binary
and we are just seeing the decimal representation here.
Here is that difference in hexadecimal:
. di %21x 436461.015625
+1.aa3b410000000X+012
. di %21x 436461
+1.aa3b400000000X+012
So near, and yet so far!
Now -chitest- squawks if the sum of observed and
the sum of expected differ by more than 0.01 and the
difference here of 0.015625 qualifies.
The absolute difference criterion of 0.01 was just
plucked out of the air when -chitest- was first
written several years ago. For numbers as big as yours
a relative difference criterion would presumably
make more sense.
Why then is -chitest- telling you both that
numbers are the same and that they are
different? "Same" comes from the display
statement, here equivalent to
. di %8.0g 436461.015625
436461
That format in turn was based on getting
integers to show as such as far as at all possible,
without irritating extra ".00000" or whatever.
The format here loses the small details, however.
"Different" comes from the numbers held
in memory, which differ by 0.015625.
I just rewrote -chitest- and -chitesti- to use
doubles throughout. The results are better:
. chitesti 314795 121666 \ 0.9318*436461 0.0682*436461
observed frequencies from keyboard; expected frequencies from keyboard
Pearson chi2(1) = 3.0e+05 Pr = 0.000
likelihood-ratio chi2(1) = 1.8e+05 Pr = 0.000
+-----------------------------------------------+
| observed expected obs - exp Pearson |
|-----------------------------------------------|
| 314795 406694.360 -91899.360 -144.105 |
| 121666 29766.640 91899.360 532.657 |
+-----------------------------------------------+
In short, this is a salutary lesson in precision. The
program author should perhaps read e.g.
http://www.stata.com/support/faqs/data/mod.html
The defence, if there is one, is that the author
grew up in a small house in a small country and still
thinks that using -double- where -float- apparently
will do fine is profligate use of space.
Incidentally, the chi-square test shows a P-value
indistinguishable from 0.
Nick
[email protected]
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]On Behalf Of
> Benoit Dulong
> Sent: 10 March 2004 21:53
> To: statalist
> Subject: st: chitesti -- warning -- expected
>
>
> The command
> chitesti 314795 121666 \ 0.9318*436461 0.0682*436461
> produced
>
> Chi-square test:
> observed frequencies from keyboard
> expected frequencies from keyboard
>
> Warning: totals of observed and expected differ
> total
> observed 436461
> expected 436461
>
> Pearson chi2(1) = 304489.6035 Pr = 0.000
> likelihood-ratio chi2(1) = 181321.5938 Pr = 0.000
>
> residuals
> observed expected classic Pearson
> 1. 314795 406694.375 -91899.375 -144.105
> 2. 121666 29766.641 91899.359 532.657
>
> ------------------------------------------------------
>
> QUESTION-1.
> I do not understand the warning because
> observed and expected do NOT differ ?
>
> QUESTION-2
> Expected (1) should be 436461*0.9318 = 406694.3598,
> not 406694.375 ?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/