Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Wrong results for Wilcoxon signed ranks test when data have decimal places (even using double) |
Date | Thu, 14 Feb 2013 16:42:00 +0000 |
If you change Stata's code, you take total responsibility. Also, there is a fundamental misunderstanding here. Using -round()- with a second argument that is a power of 10 less than 1 is not a way to get exact decimal arithmetic. The only "fix" for this problem, I believe, is for users to work with integer equivalents according to how digits they believe to be exact, which is what you re-created earlier by multiplying by 100 for your example. But that has to be at users' discretion. Nick On Thu, Feb 14, 2013 at 4:25 PM, Marta García-Granero <mgarciagranero@gmail.com> wrote: > I think I fixed that myself. I edited signrank.ado (yes, a bit risky, I > know) and replaced this line (#14): > > egen double `ranks' = rank(abs(`diff')) if `touse' > > by this one > > egen double `ranks' = rank(round(abs(`diff'),1e-15)) if `touse' > > And the program gives the correct output now. > > Maybe a future update should take this simple change into account. El 14/02/2013 17:02, Marta García-Granero escribió: >> Apologies for sending this twice, but yesterday I tried to piggyback into >> another thread ("Rounding Errors Stata 12"), although closely related to >> this question, and I think my question got lost. Besides, I'm going to >> explain the problem a bit more (and better). >> >> I'm converting some class notes (basic statistics) from SPSS to Stata, and >> I have found that the way Stata handles ranking tied data in Wilcoxon test >> can be sometimes wrong, when data have decimal places, even using -double- >> everywhere. >> >> The sample dataset comes from the on-line e-book Statistics at Square One >> (exercise at the end of chapter 1). I am using Stata 12.1 64 bits (last >> update installed) on W7, but I found the same problem with Stata 12.1 32 >> bits on Windows XP. The results I get using Stata doesn't match the ones, I >> got either with my hand calculations, or with SPSS. >> >> set type double >> input copper >> 0.70 >> 0.45 >> 0.72 >> 0.30 >> 1.16 >> 0.69 >> 0.83 >> 0.74 >> 1.24 >> 0.77 >> 0.65 >> 0.76 >> 0.42 >> 0.94 >> 0.36 >> 0.98 >> 0.64 >> 0.90 >> 0.63 >> 0.55 >> 0.78 >> 0.10 >> 0.52 >> 0.42 >> 0.58 >> 0.62 >> 1.12 >> 0.86 >> 0.74 >> 1.04 >> 0.65 >> 0.66 >> 0.81 >> 0.48 >> 0.85 >> 0.75 >> 0.73 >> 0.50 >> 0.34 >> 0.88 >> end >> >> * One sample Wilcoxon's test (against population median = 0.6) >> >> signrank copper = 0.6 >> >> * Multiply data by 100 to get rid of decimal places and running the test >> again (pop. median = 60) >> * this time all the output (positive&negative sum of ranks, Z stat&p >> value) is correct >> >> generate copper100 = round(copper*100) >> signrank copper100 = 60 >> >> * Generating the ranks for absolute differences between copper & pop >> median for both variables (copper&copper100) >> * Ranks should have been the same in both cases, but they are not >> * Notice the difference for cases 5/6/7, 18/19, 22/23/24, 29/30, 32/33 >> * "ranks2" is correct (recognizes all tied data), and leads to the right >> Wilcoxon's p-value >> >> egen double ranks1 = rank(abs(copper-0.6)) >> egen double ranks2 = rank(abs(copper100-60)) >> generate absdiff = abs(copper-0.6) >> sort absdiff >> list absdiff ranks1 ranks2 >> >> I would label that as a Stata bug. Tied absolute differences are not >> recognized as so because there is a difference at the 15th decimal place. >> Maybe some rounding should be performed before assigning ranks. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/