Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: is this the correct statistical test to compare non-normally distributed count data

From	Gwinyai Masukume <[email protected]>
To	[email protected]
Subject	Re: st: is this the correct statistical test to compare non-normally distributed count data
Date	Wed, 22 Jan 2014 12:41:11 +0200

Many thanks Nick. Appreciated.
True, a t-test produces a P-value quite similar to the ranksum result,
in some cases almost identical.
Your comments have thrown light.
Thanks,
Gwinyai

On 1/22/14, Nick Cox <[email protected]> wrote:
> This could be argued several ways. One short summary is that you've
> not told us enough about your data to allow really good advice.
>
> If your variable is a count then in principle, there is an important
> distinction: whether values could (much) exceed 9 or values could only
> be in a limited set, 0(1)9 (or 0(1)10, or whatever). You called them
> scores, so perhaps despite your word "count" they are really ordinal
> grades and not defined by being counted.
>
> If the distribution is (strongly) discrete, then it can't be normal
> and -swilk- is from one point of view incorrect and irrelevant. It
> could be approximately normal, however, other than the discreteness,
> and many researchers would take the opposite point of view and swallow
> the discreteness.
>
> But the overall distribution is not quite the question. You fed all
> the data to -swilk- but with two groups that's not the whole story.
>
> All that said, it wouldn't surprise me if a t-test produced a P-value
> loosely similar to your -ranksum- result. That's the way t-tests often
> work; in many cases they don't depend that strongly on normality
> (although outliers etc. can be problematic).
>
> The dichotomy either something is normal, or we have to retreat to
> nonparametric testing is (in my view) 1950s thinking. There is a whole
> bundle of possible tests depending on what an appropriate distribution
> is for your data.
>
> Yet more: a t-test compares means. Is that your objective, comparing
> means? If it's your objective then that question can't be answered by
> -ranksum-, as -ranksum- says nothing about means. I have to wonder
> whether your objective is comparing the distributions, in which case
> you are going to learn most from a graphical comparison, not a
> significance test.
>
> Nick
> [email protected]
>
>
> On 21 January 2014 16:07, Gwinyai Masukume <[email protected]> wrote:
>
>> I have the variable a_score which can take the values 0, 1, 2 up to 9.
>>  I have two groups and I want to compare if a_score is the same
>> between the two groups. Since a_score is not normally distributed I
>> have used a non-parametric test and the p-value shows that a_score is
>> not significantly different between the two groups if p < 0.05 is
>> considered significant.
>>
>> Have I used the correct test?
>>
>> Kind regards,
>> Gwinyai
>>
>> . swilk a_score
>>
>>                    Shapiro-Wilk W test for normal data
>>
>>     Variable |    Obs       W           V         z       Prob>z
>> -------------+--------------------------------------------------
>>      a_score |   4610    0.99456     13.698     6.850    0.00000
>>
>> .
>> . * non-parametric test
>> . ranksum a_score, by(group)
>>
>> Two-sample Wilcoxon rank-sum (Mann-Whitney) test
>>
>>        group |      obs    rank sum    expected
>> -------------+---------------------------------
>>      Group 1 |     4504    10338974    10352444
>>      Group 2 |       92    224932.5      211462
>> -------------+---------------------------------
>>     combined |     4596    10563906    10563906
>>
>> unadjusted variance   1.587e+08
>> adjustment for ties  -2084329.2
>>                      ----------
>> adjusted variance     1.567e+08
>>
>> Ho: a_score(group==Group 1) = a_score(group==Group 2)
>>              z =  -1.076
>>     Prob > |z| =   0.2818
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: is this the correct statistical test to compare non-normally distributed count data
  - From: Gwinyai Masukume <[email protected]>
- Re: st: is this the correct statistical test to compare non-normally distributed count data
  - From: Nick Cox <[email protected]>

Prev by Date: st: information criterions after -xtreg, re-
Next by Date: st: inconsistent random numbers even using -set seed-
Previous by thread: Re: st: is this the correct statistical test to compare non-normally distributed count data
Next by thread: st: Formally comparing Tobit and Probit estimates
Index(es):
- Date
- Thread