At 14:02 12/05/03 +0000, Guillaume Frechette wrote:
Dear Statalisters: I have two variables x1 and x2 for which I want to test
the null hypothesis x1 = x2 (let's say 2 sided at the 10% level). I would
normally use Signtest which I believe takes x = x1 - x2 and compares x to
a binomial with mean 1/2. Thus, if you have 5 observations, such that x
can be written as the vector [1,2,3,4,5] you would reject the null. Now,
add 1 million 0's to x and the Signtest (at least as it is implemented in
Stata) would still reject the null. However, at an "intuitive" level, it
seems to me that x1 and x2 are much more similar in the second case (with
the million observations where they are exactly the same) than in the
original case. My (very limited) understanding of the problem is that
since the variables should be continuous, an x of 0 happens with zero
probability. Is there a test which takes into account my "intuitive"
understanding or is my intuition simply wrong? I apologize for the
non-Stata question. Thanks in advance.
As I understand it, the sign test in Stata works by calculating
sign(x1-x2), which is 1 if x1>x2, 2 if x1<x2, and 0 if x1==x2, and then
compares the number of positive differences with a binomial with n equal to
the number of non-zero differences. Therefore, the sign test is testing a
hypothesis about Pr(x1>x2|x1!=x2), ie the conditional probability that
x1>x2 assuming that either x1>x2 or x1<x2, excluding the observations where
x1==x2. In the real world of applied statistics, of course, there are no
continuous variables, and it is safe to assume a non-zero probability that
x1==x2. The sign difference sign(x1-x2) will therefore be "trinomial", or
multinomial with 3 possible values, 1, -1 and 0. If we are only interested
in the ratio of positives to negatives, then it makes sense to ignore the
zeros.
If the number of observations is large, then Guillaume might type
gene sgndif=sign(x1-x2)
ci sgndif
and get a confidence interval for the mean of sign(x1-x2). In this case, if
there are a lot of observations for which x1==x2, then this will reduce the
standard deviation if sign(x1-x2) and therefore the standard error of the
mean of sign(x1-x2), so the confidence interval will therefore be narrow.
However, a narrow confidence interval for the unconditional mean
E(sign(x1-x2))
is equivalent to a wide confidence interval for the conditional mean
E( sign(x1-x2) | x1!=x2 )
which is equivalent to a sign test.
I hope this helps.
Roger
--
Roger Newson
Lecturer in Medical Statistics
Department of Public Health Sciences
King's College London
5th Floor, Capital House
42 Weston Street
London SE1 3QD
United Kingdom
Tel: 020 7848 6648 International +44 20 7848 6648
Fax: 020 7848 6620 International +44 20 7848 6620
or 020 7848 6605 International +44 20 7848 6605
Email: [email protected]
Opinions expressed are those of the author, not the institution.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/