Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: sign test output

From	Nahla Betelmal <[email protected]>
To	[email protected]
Subject	Re: st: sign test output
Date	Thu, 17 Jan 2013 11:33:44 +0000

Dear Nick,

Thanks for your reply, I will look up the reference, and I will use
the -qnorm- as well (thanks for pointing out).

But if t-test can work out even if the assumptions are not satisfied,
and I got a contradicting results using sign test (i.e. t-test :
accept the null U=0, while sign test: reject the null) which one
should I follow?

Many thanks

Nahla

ttest DA_T_1 == 0

One-sample t test
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
  DA_T_1 |     346    1.564346     1.68628    31.36663   -1.752338     4.88103
------------------------------------------------------------------------------
    mean = mean(DA_T_1)                                           t =   0.9277
Ho: mean = 0                                     degrees of freedom =      345

    Ha: mean < 0                 Ha: mean != 0                 Ha: mean > 0
 Pr(T < t) = 0.8229         Pr(|T| > |t|) = 0.3542          Pr(T > t) = 0.1771

 While

ksmirnov  DA_T_1 = normal((DA_T_1-DA_T_1_mu)/  DA_T_1_s)

One-sample Kolmogorov-Smirnov test against theoretical distribution
           normal((DA_T_1-DA_T_1_mu)/  DA_T_1_s)

 Smaller group       D       P-value  Corrected
 ----------------------------------------------
 DA_T_1:             0.4878    0.000
 Cumulative:        -0.4330    0.000
 Combined K-S:    0.4878    0.000      0.000


On 17 January 2013 10:59, Nick Cox <[email protected]> wrote:
> Sorry; I misread radically what your variable is, and it is helpful
> that you have now explained it.
>
> My suggestion of a binomial confidence interval still makes sense when
> understood in this way: equal numbers of positive and negative
> differences imply a fraction of 0.5 for pr(positive) and also
> pr(negative).
>
> The literature is large and contradictory and the advice you quote
> from somewhere
>
>>  Shapiro-Wilk is used to test normality, when the number of
>> observations is less than 30. Otherwise, we should use
>> Kolmogorov-Smirnov for large sample (as in my sample).
>
> would never be my two sentences of advice. I would always start out
> with -qnorm- and often end with it. Kolmogorov-Smirnov is more
> sensitive in the middle than in the tails of a distribution, which is
> precisely the wrong way round.
>
> All that said, there is a lot of literature to the effect that the
> t-test can work very well even when assumptions are not well
> satisfied. See for example Rupert Miller, Beyond ANOVA
>
> http://www.amazon.com/Beyond-ANOVA-Applied-Statistics-Statistical/dp/0412070111
>
> Nick
>
> On Thu, Jan 17, 2013 at 10:21 AM, Nahla Betelmal <[email protected]> wrote:
>> Dear Nick,
>>
>> Thank you for the comments. the variable I am testing is not binary ,
>> and the literary of my field is concerned whether the mean (median) of
>> this variable is different than zero. So, U is the mean in case the
>> variable is normally distributed, or U is the median in case the
>> distribution is not normal.
>>
>> from my readings in statistics , I know that in order to decide
>> whether to use parametric or non-parametric tests, the data normality
>> distribution should be checked first.
>>
>>  Shapiro-Wilk is used to test normality, when the number of
>> observations is less than 30. Otherwise, we should use
>> Kolmogorov-Smirnov for large sample (as in my sample).
>>
>> So, when the test accepts the null (normality), we should use the
>> parametric test (i.e. t-test) which examines the mean. On the other
>> hand if the null of normality was reject, we should use the
>> non-parametric test ( sign test) instead which examines the median (As
>> in my case).
>>
>> Also,  for the comment about robust, I meant exactly what said (I used
>> the robust term loosely)
>>
>> Thanks for suggesting to read again, sure I will do.
>>
>> Many thanks again
>>
>> Nahla
>>
>> On 17 January 2013 09:49, Nick Cox <[email protected]> wrote:
>>> Your t-test is testing a quite different hypothesis. If the two states
>>> 0 and 1 of a binary variable have equal frequencies, then its mean is
>>> 0.5, not 0.
>>>
>>> That aside, the t-test can not be more appropriate for a binary
>>> variable than what you have done already, and this is predictable in
>>> advance, as a distribution with two distinct states is not a normal
>>> distribution. You do not need a Kolmogorov-Smirnov test to tell you
>>> that.
>>>
>>> For the record, what I suggested is best not described as a robust
>>> test. It was calculating a confidence interval, and I showed that for
>>> your data the result was robust to the method of calculation, meaning
>>> merely not sensitive. The word "robust" was used informallly.
>>>
>>> You never define what you mean by u, so I am not commenting on any
>>> details about u.
>>>
>>> I recommend that you read (or re-read) a good introductory text on
>>> statistics, as you appear confused on some basic matters.
>>>
>>> Nick
>>>
>>> On Thu, Jan 17, 2013 at 7:52 AM, Nahla Betelmal <[email protected]> wrote:
>>>
>>>> Thank you Maarten and Nick  for the great help.
>>>>
>>>>  So, in this case I would reject the null in favour of the alternative
>>>> u>0 as p value 0.000. However, using t-test on the same sample
>>>> provided the opposite (i.e. accept the null).
>>>>
>>>> ttest DA_T_1 == 0
>>>>
>>>> One-sample t test
>>>> ------------------------------------------------------------------------------
>>>> Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
>>>> ---------+--------------------------------------------------------------------
>>>>   DA_T_1 |     346    1.564346     1.68628    31.36663   -1.752338     4.88103
>>>> ------------------------------------------------------------------------------
>>>>     mean = mean(DA_T_1)                                           t =   0.9277
>>>> Ho: mean = 0                                     degrees of freedom =      345
>>>>
>>>>     Ha: mean < 0                 Ha: mean != 0                 Ha: mean > 0
>>>>  Pr(T < t) = 0.8229         Pr(|T| > |t|) = 0.3542          Pr(T > t) = 0.1771
>>>>
>>>>
>>>> I think this is due to the distribution of the sample, so I performed
>>>> K-S normality test. It shows that data is not normally distributed,
>>>> hence I should use the non-parametric sign test instead of t-test. In
>>>> other words I would reject the null u=0 in favor of u>0 , right?
>>>>
>>>>
>>>> ksmirnov  DA_T_1 = normal((DA_T_1-DA_T_1_mu)/  DA_T_1_s)
>>>>
>>>> One-sample Kolmogorov-Smirnov test against theoretical distribution
>>>>            normal((DA_T_1-DA_T_1_mu)/  DA_T_1_s)
>>>>
>>>>  Smaller group       D       P-value  Corrected
>>>>  ----------------------------------------------
>>>>  DA_T_1:             0.4878    0.000
>>>>  Cumulative:        -0.4330    0.000
>>>>  Combined K-S:    0.4878    0.000      0.000
>>>>
>>>>
>>>> N.B. Thank you so much Nick for the robust test you mentioned, I will
>>>> use that as well)
>>>>
>>>> Many thanks
>>>>
>>>> Nahla
>>>>
>>>> On 16 January 2013 09:33, Nick Cox <[email protected]> wrote:
>>>>> In addition, it could be as or more useful to think in terms of
>>>>> confidence intervals. With this sample size and average, 0.5 lies well
>>>>> outside 95% intervals for the probability of being positive, and that
>>>>> is robust to method of calculation:
>>>>>
>>>>> . cii 346 221
>>>>>
>>>>>                                                          -- Binomial Exact --
>>>>>     Variable |        Obs        Mean    Std. Err.       [95% Conf. Interval]
>>>>> -------------+---------------------------------------------------------------
>>>>>              |        346    .6387283    .0258248        .5856497    .6894096
>>>>>
>>>>> . cii 346 221, jeffreys
>>>>>
>>>>>                                                          ----- Jeffreys -----
>>>>>     Variable |        Obs        Mean    Std. Err.       [95% Conf. Interval]
>>>>> -------------+---------------------------------------------------------------
>>>>>              |        346    .6387283    .0258248        .5871262    .6880204
>>>>>
>>>>> . cii 346 221, wilson
>>>>>
>>>>>                                                          ------ Wilson ------
>>>>>     Variable |        Obs        Mean    Std. Err.       [95% Conf. Interval]
>>>>> -------------+---------------------------------------------------------------
>>>>>              |        346    .6387283    .0258248        .5868449    .6875651
>>>>>
>>>>> Nick
>>>>>
>>>>> On Wed, Jan 16, 2013 at 9:13 AM, Maarten Buis <[email protected]> wrote:
>>>>>> On Wed, Jan 16, 2013 at 9:38 AM, Nahla Betelmal wrote:
>>>>>>> I have generated this output using  non-parametric test "one sample
>>>>>>> sign test" with null: U=0 , & Ua > 0
>>>>>>>
>>>>>>> However, I do not understand the output. where is the p-value? is it
>>>>>>> 0.5 in all cases or the 0.000 ( as in the first and third cases) and
>>>>>>> 1.000 as in the second case?
>>>>>>>
>>>>>>>. signtest DA_T_1= 0
>>>>>>>
>>>>>>> Sign test
>>>>>>>
>>>>>>>         sign |    observed    expected
>>>>>>> -------------+------------------------
>>>>>>>     positive |         221         173
>>>>>>>     negative |         125         173
>>>>>>>         zero |           0           0
>>>>>>> -------------+------------------------
>>>>>>>          all |         346         346
>>>>>>>
>>>>>>> One-sided tests:
>>>>>>>   Ho: median of DA_T_1 = 0 vs.
>>>>>>>   Ha: median of DA_T_1 > 0
>>>>>>>       Pr(#positive >= 221) =
>>>>>>>          Binomial(n = 346, x >= 221, p = 0.5) =  0.0000
>>>>>>
>>>>>> The p-value is the last number, so in your case 0.0000. The stuff
>>>>>> before the p-value tells you how it is computed: it is based on the
>>>>>> binomial distribution, and in particular it is the chance of observing
>>>>>> 221 successes or more in 346 trials when the chance of success at each
>>>>>> trial is .5. For this tests this chance is the p-value, and it is very
>>>>>> small, less than 0.00005. If you type in Stata -di binomialtail(346,
>>>>>> 221, 0.5)- you will see that this chance is 1.381e-07, i.e.
>>>>>> 0.00000001381.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: sign test output
  - From: "JVerkuilen (Gmail)" <[email protected]>
- Re: st: sign test output
  - From: Nick Cox <[email protected]>

References:
- st: sign test output
  - From: Nahla Betelmal <[email protected]>
- Re: st: sign test output
  - From: Maarten Buis <[email protected]>
- Re: st: sign test output
  - From: Nick Cox <[email protected]>
- Re: st: sign test output
  - From: Nahla Betelmal <[email protected]>
- Re: st: sign test output
  - From: Nick Cox <[email protected]>
- Re: st: sign test output
  - From: Nahla Betelmal <[email protected]>
- Re: st: sign test output
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: generate variable versus define scalar, with conditional statement
Next by Date: Re: st: sign test output
Previous by thread: Re: st: sign test output
Next by thread: Re: st: sign test output
Index(es):
- Date
- Thread