Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Signtest (statistics question)


From   Roger Newson <[email protected]>
To   [email protected]
Subject   Re: st: Signtest (statistics question)
Date   Mon, 12 May 2003 15:31:02 +0100

At 14:02 12/05/03 +0000, Guillaume Frechette wrote:
Dear Statalisters: I have two variables x1 and x2 for which I want to test the null hypothesis x1 = x2 (let's say 2 sided at the 10% level). I would normally use Signtest which I believe takes x = x1 - x2 and compares x to a binomial with mean 1/2. Thus, if you have 5 observations, such that x can be written as the vector [1,2,3,4,5] you would reject the null. Now, add 1 million 0's to x and the Signtest (at least as it is implemented in Stata) would still reject the null. However, at an "intuitive" level, it seems to me that x1 and x2 are much more similar in the second case (with the million observations where they are exactly the same) than in the original case. My (very limited) understanding of the problem is that since the variables should be continuous, an x of 0 happens with zero probability. Is there a test which takes into account my "intuitive" understanding or is my intuition simply wrong? I apologize for the non-Stata question. Thanks in advance.
As I understand it, the sign test in Stata works by calculating sign(x1-x2), which is 1 if x1>x2, 2 if x1<x2, and 0 if x1==x2, and then compares the number of positive differences with a binomial with n equal to the number of non-zero differences. Therefore, the sign test is testing a hypothesis about Pr(x1>x2|x1!=x2), ie the conditional probability that x1>x2 assuming that either x1>x2 or x1<x2, excluding the observations where x1==x2. In the real world of applied statistics, of course, there are no continuous variables, and it is safe to assume a non-zero probability that x1==x2. The sign difference sign(x1-x2) will therefore be "trinomial", or multinomial with 3 possible values, 1, -1 and 0. If we are only interested in the ratio of positives to negatives, then it makes sense to ignore the zeros.

If the number of observations is large, then Guillaume might type

gene sgndif=sign(x1-x2)
ci sgndif

and get a confidence interval for the mean of sign(x1-x2). In this case, if there are a lot of observations for which x1==x2, then this will reduce the standard deviation if sign(x1-x2) and therefore the standard error of the mean of sign(x1-x2), so the confidence interval will therefore be narrow. However, a narrow confidence interval for the unconditional mean

E(sign(x1-x2))

is equivalent to a wide confidence interval for the conditional mean

E( sign(x1-x2) | x1!=x2 )

which is equivalent to a sign test.

I hope this helps.

Roger


--
Roger Newson
Lecturer in Medical Statistics
Department of Public Health Sciences
King's College London
5th Floor, Capital House
42 Weston Street
London SE1 3QD
United Kingdom

Tel: 020 7848 6648 International +44 20 7848 6648
Fax: 020 7848 6620 International +44 20 7848 6620
or 020 7848 6605 International +44 20 7848 6605
Email: [email protected]

Opinions expressed are those of the author, not the institution.

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index