[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: t-test comparing the means of two samples in imputed datasets

From	"Isabel Canette, StataCorp LP" <[email protected]>
To	[email protected]
Subject	Re: st: RE: t-test comparing the means of two samples in imputed datasets
Date	Wed, 04 Nov 2009 17:40:32 -0600
Clara Barata <maria_barata(at)mail(dot)harvard(dot)edu> has multiply imputed data,
and wants to perform the equivalent to an unpaired t-test with equal variances:

 > Any idea on how to apply a ttest to compare means in datasets imputed with
 > MI (stata 11)?  What would be the equivalent to: "ttest var , by (dummy)" in
 > the MI world?

Let's forget for a moment that she has imputed data.  As David Radwin
<dradwin(at)mprinc(dot)com> pointed out:

         http://www.stata.com/statalist/archive/2009-11/msg00198.html

performing an unpaired t-test with equal variances is equivalent to performing
a regression where the dependent variable is our variable of interest, and the
independent variable is a dummy that indicates one of the two groups.  Here is
an example:

. sysuse auto, clear
(1978 Automobile Data)

. ttest price, by(foreign)

Two-sample t test with equal variances
------------------------------------------------------------------------------
    Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
Domestic |      52    6072.423    429.4911    3097.104    5210.184    6934.662
  Foreign |      22    6384.682    558.9942    2621.915     5222.19    7547.174
---------+--------------------------------------------------------------------
combined |      74    6165.257    342.8719    2949.496    5481.914      6848.6
---------+--------------------------------------------------------------------
     diff |           -312.2587    754.4488               -1816.225    1191.708
------------------------------------------------------------------------------
     diff = mean(Domestic) - mean(Foreign)                         t =  -0.4139
Ho: diff = 0                                     degrees of freedom =       72

     Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
  Pr(T < t) = 0.3401         Pr(|T| > |t|) = 0.6802          Pr(T > t) = 0.6599


. regress price foreign
       Source |       SS       df       MS              Number of obs =      74
-------------+------------------------------           F(  1,    72) =    0.17
        Model |  1507382.66     1  1507382.66           Prob > F      =  0.6802
     Residual |   633558013    72  8799416.85           R-squared     =  0.0024
-------------+------------------------------           Adj R-squared = -0.0115
        Total |   635065396    73  8699525.97           Root MSE      =  2966.4

------------------------------------------------------------------------------------------------------------------------------------------------------------
        rep78 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      foreign |   1.199738   .2184457     5.49   0.000     .7633995    1.636076
        _cons |   3.054808   .1189696    25.68   0.000     2.817185    3.292431
------------------------------------------------------------------------------

The t-test reported for foreign is the MI version for the two-tailed t-test.
We also can use the returned values from -mi estimate- to compute two-tailed
and one-tailed p-values:

. scalar coef_for = el(e(b_mi),1,1)

. scalar se_for = sqrt(el(e(V_mi),1,1))

. scalar df_for = el(e(df_mi),1,1)

. display 2*ttail(df_for, abs(coef_for/se_for))
7.213e-07

. display ttail(df_for, coef_for/se_for)
3.606e-07

. display ttail(df_for, -coef_for/se_for)
.99999964


Notice that in the MI framework there is a specific degrees of freedom value
for each coefficient. This is why I need to take the specific degrees of
freedom for the first coefficient from the matrix e(df_mi).


-- Isabel
icanette(at)stata(dot)com

        price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      foreign |   312.2587   754.4488     0.41   0.680    -1191.708    1816.225
        _cons |   6072.423    411.363    14.76   0.000     5252.386     6892.46
------------------------------------------------------------------------------

The t-test reported for the variable foreign is the two-tailed test reported by
-ttest-.  We can use the e-returned values by -regress- to obtain the three
p-values:

. mat b = e(b)

. mat V = e(V)

. scalar coef_for = el(b,1,1)

. scalar se_for = sqrt(el(V,1,1))

. display 2*ttail(e(df_r), abs(coef_for/se_for))
.68018509

. display ttail(e(df_r), coef_for/se_for)
.34009254

. display ttail(e(df_r), -coef_for/se_for)
.65990746

Now, we can follow the analogous procedure for multiply-imputed data;
this time the test will be performed on the variable rep78, after imputing
it using -mi impute 
mlogit-.------------------------------------------------------------------------------
        rep78 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      foreign |   1.199738   .2184457     5.49   0.000     .7633995    1.636076
        _cons |   3.054808   .1189696    25.68   0.000     2.817185    3.292431
------------------------------------------------------------------------------

The t-test reported for foreign is the MI version for the two-tailed t-test.
We also can use the returned values from -mi estimate- to compute two-tailed
and one-tailed p-values:

. scalar coef_for = el(e(b_mi),1,1)

. scalar se_for = sqrt(el(e(V_mi),1,1))

. scalar df_for = el(e(df_mi),1,1)

. display 2*ttail(df_for, abs(coef_for/se_for))
7.213e-07

. display ttail(df_for, coef_for/se_for)
3.606e-07

. display ttail(df_for, -coef_for/se_for)
.99999964


Notice that in the MI framework there is a specific degrees of freedom value
for each coefficient. This is why I need to take the specific degrees of
freedom for the first coefficient from the matrix e(df_mi).


-- Isabel
icanette(at)stata(dot)com


. sysuse auto, clear
(1978 Automobile Data)

. mi set flong

. mi register imputed rep78
(5 m=0 obs. now marked as incomplete)

. mi impute mlogit rep mpg disp turn, add(20)

Univariate imputation                   Imputations =       20
Multinomial logistic regression               added =       20
Imputed: m=1 through m=20                   updated =        0

                |              Observations per m
                |----------------------------------------------
       Variable |   complete   incomplete   imputed |     total
---------------+-----------------------------------+----------
          rep78 |         69            5         5 |        74
--------------------------------------------------------------
(complete + incomplete = total; imputed is the minimum across m
  of the number of filled in observations.)

. mi estimate: regress rep78 foreign

Multiple-imputation estimates                     Imputations     =         20
Linear regression                                 Number of obs   =         74
                                                   Average RVI     =     0.0687
                                                   Complete DF     =         72
DF adjustment:   Small sample                     DF:     min     =      64.44
                                                           avg     =      64.54
                                                           max     =      64.65
Model F test:       Equal FMI                     F(   1,   64.4) =      30.16
Within VCE type:          OLS                     Prob > F        =     0.0000

------------------------------------------------------------------------------
        rep78 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      foreign |   1.199738   .2184457     5.49   0.000     .7633995    1.636076
        _cons |   3.054808   .1189696    25.68   0.000     2.817185    3.292431
------------------------------------------------------------------------------

The t-test reported for foreign is the MI version for the two-tailed t-test.
We also can use the returned values from -mi estimate: regress- to compute
two-tailed and one-tailed p-values:

. scalar coef_for = el(e(b_mi),1,1)

. scalar se_for = sqrt(el(e(V_mi),1,1))

. scalar df_for = el(e(df_mi),1,1)

. display 2*ttail(df_for, abs(coef_for/se_for))
7.213e-07

. display ttail(df_for, coef_for/se_for)
3.606e-07

. display ttail(df_for, -coef_for/se_for)
.99999964


Notice that in the MI framework there is a specific degrees of freedom value
for each coefficient.  This is why I need to take the specific degrees of
freedom for the first coefficient from the matrix e(df_mi).


-- Isabel
icanette(at)stata(dot)com






*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Prev by Date: st: DHS data
Next by Date: st: including "gen" in a "by"-command
Previous by thread: st: DHS data
Next by thread: st: including "gen" in a "by"-command
Index(es):
- Date
- Thread