Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Wrong results for Wilcoxon signed ranks test when data have decimal places (even using double)
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Wrong results for Wilcoxon signed ranks test when data have decimal places (even using double)
Date
Thu, 14 Feb 2013 16:42:00 +0000
If you change Stata's code, you take total responsibility.
Also, there is a fundamental misunderstanding here. Using -round()-
with a second argument that is a power of 10 less than 1 is not a way
to get exact decimal arithmetic.
The only "fix" for this problem, I believe, is for users to work with
integer equivalents according to how digits they believe to be exact,
which is what you re-created earlier by multiplying by 100 for your
example. But that has to be at users' discretion.
Nick
On Thu, Feb 14, 2013 at 4:25 PM, Marta García-Granero
<[email protected]> wrote:
> I think I fixed that myself. I edited signrank.ado (yes, a bit risky, I
> know) and replaced this line (#14):
>
> egen double `ranks' = rank(abs(`diff')) if `touse'
>
> by this one
>
> egen double `ranks' = rank(round(abs(`diff'),1e-15)) if `touse'
>
> And the program gives the correct output now.
>
> Maybe a future update should take this simple change into account.
El 14/02/2013 17:02, Marta García-Granero escribió:
>> Apologies for sending this twice, but yesterday I tried to piggyback into
>> another thread ("Rounding Errors Stata 12"), although closely related to
>> this question, and I think my question got lost. Besides, I'm going to
>> explain the problem a bit more (and better).
>>
>> I'm converting some class notes (basic statistics) from SPSS to Stata, and
>> I have found that the way Stata handles ranking tied data in Wilcoxon test
>> can be sometimes wrong, when data have decimal places, even using -double-
>> everywhere.
>>
>> The sample dataset comes from the on-line e-book Statistics at Square One
>> (exercise at the end of chapter 1). I am using Stata 12.1 64 bits (last
>> update installed) on W7, but I found the same problem with Stata 12.1 32
>> bits on Windows XP. The results I get using Stata doesn't match the ones, I
>> got either with my hand calculations, or with SPSS.
>>
>> set type double
>> input copper
>> 0.70
>> 0.45
>> 0.72
>> 0.30
>> 1.16
>> 0.69
>> 0.83
>> 0.74
>> 1.24
>> 0.77
>> 0.65
>> 0.76
>> 0.42
>> 0.94
>> 0.36
>> 0.98
>> 0.64
>> 0.90
>> 0.63
>> 0.55
>> 0.78
>> 0.10
>> 0.52
>> 0.42
>> 0.58
>> 0.62
>> 1.12
>> 0.86
>> 0.74
>> 1.04
>> 0.65
>> 0.66
>> 0.81
>> 0.48
>> 0.85
>> 0.75
>> 0.73
>> 0.50
>> 0.34
>> 0.88
>> end
>>
>> * One sample Wilcoxon's test (against population median = 0.6)
>>
>> signrank copper = 0.6
>>
>> * Multiply data by 100 to get rid of decimal places and running the test
>> again (pop. median = 60)
>> * this time all the output (positive&negative sum of ranks, Z stat&p
>> value) is correct
>>
>> generate copper100 = round(copper*100)
>> signrank copper100 = 60
>>
>> * Generating the ranks for absolute differences between copper & pop
>> median for both variables (copper&copper100)
>> * Ranks should have been the same in both cases, but they are not
>> * Notice the difference for cases 5/6/7, 18/19, 22/23/24, 29/30, 32/33
>> * "ranks2" is correct (recognizes all tied data), and leads to the right
>> Wilcoxon's p-value
>>
>> egen double ranks1 = rank(abs(copper-0.6))
>> egen double ranks2 = rank(abs(copper100-60))
>> generate absdiff = abs(copper-0.6)
>> sort absdiff
>> list absdiff ranks1 ranks2
>>
>> I would label that as a Stata bug. Tied absolute differences are not
>> recognized as so because there is a difference at the 15th decimal place.
>> Maybe some rounding should be performed before assigning ranks.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/