Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Tukey's HSD test from summary statistics
From
[email protected] (Jeff Pitblado, StataCorp LP)
To
[email protected]
Subject
Re: st: Tukey's HSD test from summary statistics
Date
Wed, 08 Feb 2012 13:46:23 -0600
Maria Niarchou <[email protected]> asks
> Is there a way to calculate Tukey's HSD test in Stata when only sample
> sizes, means and standard deviations are available?
The short answer is: Yes.
New in Stata 12 are the functions -tukeyprob()- and -invtukeyprob()- that
compute cumulative probabilities and quantiles from Tukey's studentized range
distribution.
-----------------------------------------------------------------------------
Here is the longer answer with some formulas, followed by an example.
Suppose we have k means to compare, where mean m_i and standard deviation s_i
were computed from group i having sample size n_i.
Our first problem is to determine how to estimate the standard error of a
given difference, say
SE(m_1-m_2) = ?
Assuming a common variance between the k groups, we can pool the sample
variance estimates to get
MSE = (1/df) sum_i (n_i-1)*s_i^2
where
df = sum_i (n_i - 1)
So the HSD test statistic, assuming equal variances, becomes
q = abs(m_1 - m_2)/sqrt(MSE*(1/n_1 + 1/n_2)/2)
The extra divisor 2 in the square root comes from the fact that we are looking
as the absolute difference between m_1 and m_2.
A 5% critical value can be computed using the -invtukeyprob()- function.
crit = invtukeyprob(k, df, .95)
The corresponding p-value can be computed using the -tukeyprob()- function.
p = 1 - tukeyprob(k, df, q)
If we can't assume unequal variances, then the test statistic becomes
q = (m_1 - m_2)/sqrt((s_1^2/n_1 + s_2^2/n_2)/2)
-----------------------------------------------------------------------------
Example 6 in -[R] ttest- performs an unpaired ttest assuming equal variances
***** BEGIN:
. ttesti 20 20 5 32 15 4
Two-sample t test with equal variances
------------------------------------------------------------------------------
| Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
x | 20 20 1.118034 5 17.65993 22.34007
y | 32 15 .7071068 4 13.55785 16.44215
---------+--------------------------------------------------------------------
combined | 52 16.92308 .6943785 5.007235 15.52905 18.3171
---------+--------------------------------------------------------------------
diff | 5 1.256135 2.476979 7.523021
------------------------------------------------------------------------------
diff = mean(x) - mean(y) t = 3.9805
Ho: diff = 0 degrees of freedom = 50
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.9999 Pr(|T| > |t|) = 0.0002 Pr(T > t) = 0.0001
***** END:
Suppose this test represents only 1 comparison among 5 means, and lets pretend
that sqrt(MSE) is the same as the Std. Dev. for the combined means above.
Also, let's assume the total degrees of freedom is df = 100.
The HSD test statistic is
q = (20 - 15)/(5.007235*sqrt((1/20 + 1/15)/2))
= 4.1344109
The 5% critical value is
crit = invtukeyprob(k, df, .95)
= 3.9289372
The p-value is
p = 1 - tukeyprob(k, df, q)
= .03400394
For unequal variances, the results from -ttesti- are
***** BEGIN:
. ttesti 20 20 5 32 15 4, unequal
Two-sample t test with unequal variances
------------------------------------------------------------------------------
| Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
x | 20 20 1.118034 5 17.65993 22.34007
y | 32 15 .7071068 4 13.55785 16.44215
---------+--------------------------------------------------------------------
combined | 52 16.92308 .6943785 5.007235 15.52905 18.3171
---------+--------------------------------------------------------------------
diff | 5 1.322876 2.311343 7.688657
------------------------------------------------------------------------------
diff = mean(x) - mean(y) t = 3.7796
Ho: diff = 0 Satterthwaite's degrees of freedom = 33.9142
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.9997 Pr(|T| > |t|) = 0.0006 Pr(T > t) = 0.0003
***** END:
The HSD test statistic is
q = (20 - 15)/sqrt((5^2/20 + 4^2/32)/2)
= 5.3452248
The 5% critical value is still
crit = invtukeyprob(k, df, .95)
= 3.9289372
The p-value is
p = 1 - tukeyprob(k, df, q)
= .00243234
--Jeff
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/