Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Tukey's HSD test from summary statistics

From	Muhammad Anees <[email protected]>
To	[email protected]
Subject	Re: st: Tukey's HSD test from summary statistics
Date	Thu, 9 Feb 2012 10:12:58 +0500

Thanks Jeff for the detailed and very informative response.

Best,
Anees

On Thu, Feb 9, 2012 at 12:46 AM, Jeff Pitblado, StataCorp LP
<[email protected]> wrote:
> Maria Niarchou <[email protected]> asks
>
>> Is there a way to calculate Tukey's HSD test in Stata when only sample
>> sizes, means and standard deviations are available?
>
> The short answer is: Yes.
>
> New in Stata 12 are the functions -tukeyprob()- and -invtukeyprob()- that
> compute cumulative probabilities and quantiles from Tukey's studentized range
> distribution.
>
> -----------------------------------------------------------------------------
>
> Here is the longer answer with some formulas, followed by an example.
>
> Suppose we have k means to compare, where mean m_i and standard deviation s_i
> were computed from group i having sample size n_i.
>
> Our first problem is to determine how to estimate the standard error of a
> given difference, say
>
>        SE(m_1-m_2) = ?
>
> Assuming a common variance between the k groups, we can pool the sample
> variance estimates to get
>
>        MSE = (1/df) sum_i (n_i-1)*s_i^2
>
> where
>
>        df = sum_i (n_i - 1)
>
> So the HSD test statistic, assuming equal variances, becomes
>
>        q = abs(m_1 - m_2)/sqrt(MSE*(1/n_1 + 1/n_2)/2)
>
> The extra divisor 2 in the square root comes from the fact that we are looking
> as the absolute difference between m_1 and m_2.
>
> A 5% critical value can be computed using the -invtukeyprob()- function.
>
>        crit = invtukeyprob(k, df, .95)
>
> The corresponding p-value can be computed using the -tukeyprob()- function.
>
>        p = 1 - tukeyprob(k, df, q)
>
> If we can't assume unequal variances, then the test statistic becomes
>
>        q = (m_1 - m_2)/sqrt((s_1^2/n_1 + s_2^2/n_2)/2)
>
> -----------------------------------------------------------------------------
>
> Example 6 in -[R] ttest- performs an unpaired ttest assuming equal variances
>
> ***** BEGIN:
> . ttesti 20 20 5 32 15 4
>
> Two-sample t test with equal variances
> ------------------------------------------------------------------------------
>         |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
> ---------+--------------------------------------------------------------------
>       x |      20          20    1.118034           5    17.65993    22.34007
>       y |      32          15    .7071068           4    13.55785    16.44215
> ---------+--------------------------------------------------------------------
> combined |      52    16.92308    .6943785    5.007235    15.52905     18.3171
> ---------+--------------------------------------------------------------------
>    diff |                   5    1.256135                2.476979    7.523021
> ------------------------------------------------------------------------------
>    diff = mean(x) - mean(y)                                      t =   3.9805
> Ho: diff = 0                                     degrees of freedom =       50
>
>    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
>  Pr(T < t) = 0.9999         Pr(|T| > |t|) = 0.0002          Pr(T > t) = 0.0001
> ***** END:
>
> Suppose this test represents only 1 comparison among 5 means, and lets pretend
> that sqrt(MSE) is the same as the Std. Dev. for the combined means above.
> Also, let's assume the total degrees of freedom is df = 100.
>
> The HSD test statistic is
>
>        q       = (20 - 15)/(5.007235*sqrt((1/20 + 1/15)/2))
>                = 4.1344109
>
> The 5% critical value is
>
>        crit    = invtukeyprob(k, df, .95)
>                = 3.9289372
>
> The p-value is
>
>        p       = 1 - tukeyprob(k, df, q)
>                = .03400394
>
> For unequal variances, the results from -ttesti- are
>
> ***** BEGIN:
> . ttesti 20 20 5 32 15 4, unequal
>
> Two-sample t test with unequal variances
> ------------------------------------------------------------------------------
>         |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
> ---------+--------------------------------------------------------------------
>       x |      20          20    1.118034           5    17.65993    22.34007
>       y |      32          15    .7071068           4    13.55785    16.44215
> ---------+--------------------------------------------------------------------
> combined |      52    16.92308    .6943785    5.007235    15.52905     18.3171
> ---------+--------------------------------------------------------------------
>    diff |                   5    1.322876                2.311343    7.688657
> ------------------------------------------------------------------------------
>    diff = mean(x) - mean(y)                                      t =   3.7796
> Ho: diff = 0                     Satterthwaite's degrees of freedom =  33.9142
>
>    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
>  Pr(T < t) = 0.9997         Pr(|T| > |t|) = 0.0006          Pr(T > t) = 0.0003
> ***** END:
>
> The HSD test statistic is
>
>        q       = (20 - 15)/sqrt((5^2/20 + 4^2/32)/2)
>                = 5.3452248
>
> The 5% critical value is still
>
>        crit    = invtukeyprob(k, df, .95)
>                = 3.9289372
>
> The p-value is
>
>        p       = 1 - tukeyprob(k, df, q)
>                = .00243234
>
> --Jeff
> [email protected]
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/



-- 

Best
---------------------------
Muhammad Anees
Assistant Professor/Programme Coordinator
COMSATS Institute of Information Technology
Attock 43600, Pakistan
http://www.aneconomist.com

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- Re: st: Tukey's HSD test from summary statistics
  - From: [email protected] (Jeff Pitblado, StataCorp LP)

Prev by Date: st: Zeros and measures of inequality or concentration
Next by Date: Re: st: Output logistic regression results using outreg
Previous by thread: RE: st: Tukey's HSD test from summary statistics
Next by thread: st: R-Squre for Fixed Effects
Index(es):
- Date
- Thread