|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: t-test comparing the means of two samples in imputed datasets
From |
"Isabel Canette, StataCorp LP" <[email protected]> |
To |
[email protected] |
Subject |
Re: st: RE: t-test comparing the means of two samples in imputed datasets |
Date |
Wed, 04 Nov 2009 17:40:32 -0600 |
Clara Barata <maria_barata(at)mail(dot)harvard(dot)edu> has multiply imputed data,
and wants to perform the equivalent to an unpaired t-test with equal variances:
> Any idea on how to apply a ttest to compare means in datasets imputed with
> MI (stata 11)? What would be the equivalent to: "ttest var , by (dummy)" in
> the MI world?
Let's forget for a moment that she has imputed data. As David Radwin
<dradwin(at)mprinc(dot)com> pointed out:
http://www.stata.com/statalist/archive/2009-11/msg00198.html
performing an unpaired t-test with equal variances is equivalent to performing
a regression where the dependent variable is our variable of interest, and the
independent variable is a dummy that indicates one of the two groups. Here is
an example:
. sysuse auto, clear
(1978 Automobile Data)
. ttest price, by(foreign)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Domestic | 52 6072.423 429.4911 3097.104 5210.184 6934.662
Foreign | 22 6384.682 558.9942 2621.915 5222.19 7547.174
---------+--------------------------------------------------------------------
combined | 74 6165.257 342.8719 2949.496 5481.914 6848.6
---------+--------------------------------------------------------------------
diff | -312.2587 754.4488 -1816.225 1191.708
------------------------------------------------------------------------------
diff = mean(Domestic) - mean(Foreign) t = -0.4139
Ho: diff = 0 degrees of freedom = 72
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.3401 Pr(|T| > |t|) = 0.6802 Pr(T > t) = 0.6599
. regress price foreign
Source | SS df MS Number of obs = 74
-------------+------------------------------ F( 1, 72) = 0.17
Model | 1507382.66 1 1507382.66 Prob > F = 0.6802
Residual | 633558013 72 8799416.85 R-squared = 0.0024
-------------+------------------------------ Adj R-squared = -0.0115
Total | 635065396 73 8699525.97 Root MSE = 2966.4
------------------------------------------------------------------------------------------------------------------------------------------------------------
rep78 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
foreign | 1.199738 .2184457 5.49 0.000 .7633995 1.636076
_cons | 3.054808 .1189696 25.68 0.000 2.817185 3.292431
------------------------------------------------------------------------------
The t-test reported for foreign is the MI version for the two-tailed t-test.
We also can use the returned values from -mi estimate- to compute two-tailed
and one-tailed p-values:
. scalar coef_for = el(e(b_mi),1,1)
. scalar se_for = sqrt(el(e(V_mi),1,1))
. scalar df_for = el(e(df_mi),1,1)
. display 2*ttail(df_for, abs(coef_for/se_for))
7.213e-07
. display ttail(df_for, coef_for/se_for)
3.606e-07
. display ttail(df_for, -coef_for/se_for)
.99999964
Notice that in the MI framework there is a specific degrees of freedom value
for each coefficient. This is why I need to take the specific degrees of
freedom for the first coefficient from the matrix e(df_mi).
-- Isabel
icanette(at)stata(dot)com
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
foreign | 312.2587 754.4488 0.41 0.680 -1191.708 1816.225
_cons | 6072.423 411.363 14.76 0.000 5252.386 6892.46
------------------------------------------------------------------------------
The t-test reported for the variable foreign is the two-tailed test reported by
-ttest-. We can use the e-returned values by -regress- to obtain the three
p-values:
. mat b = e(b)
. mat V = e(V)
. scalar coef_for = el(b,1,1)
. scalar se_for = sqrt(el(V,1,1))
. display 2*ttail(e(df_r), abs(coef_for/se_for))
.68018509
. display ttail(e(df_r), coef_for/se_for)
.34009254
. display ttail(e(df_r), -coef_for/se_for)
.65990746
Now, we can follow the analogous procedure for multiply-imputed data;
this time the test will be performed on the variable rep78, after imputing
it using -mi impute
mlogit-.------------------------------------------------------------------------------
rep78 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
foreign | 1.199738 .2184457 5.49 0.000 .7633995 1.636076
_cons | 3.054808 .1189696 25.68 0.000 2.817185 3.292431
------------------------------------------------------------------------------
The t-test reported for foreign is the MI version for the two-tailed t-test.
We also can use the returned values from -mi estimate- to compute two-tailed
and one-tailed p-values:
. scalar coef_for = el(e(b_mi),1,1)
. scalar se_for = sqrt(el(e(V_mi),1,1))
. scalar df_for = el(e(df_mi),1,1)
. display 2*ttail(df_for, abs(coef_for/se_for))
7.213e-07
. display ttail(df_for, coef_for/se_for)
3.606e-07
. display ttail(df_for, -coef_for/se_for)
.99999964
Notice that in the MI framework there is a specific degrees of freedom value
for each coefficient. This is why I need to take the specific degrees of
freedom for the first coefficient from the matrix e(df_mi).
-- Isabel
icanette(at)stata(dot)com
. sysuse auto, clear
(1978 Automobile Data)
. mi set flong
. mi register imputed rep78
(5 m=0 obs. now marked as incomplete)
. mi impute mlogit rep mpg disp turn, add(20)
Univariate imputation Imputations = 20
Multinomial logistic regression added = 20
Imputed: m=1 through m=20 updated = 0
| Observations per m
|----------------------------------------------
Variable | complete incomplete imputed | total
---------------+-----------------------------------+----------
rep78 | 69 5 5 | 74
--------------------------------------------------------------
(complete + incomplete = total; imputed is the minimum across m
of the number of filled in observations.)
. mi estimate: regress rep78 foreign
Multiple-imputation estimates Imputations = 20
Linear regression Number of obs = 74
Average RVI = 0.0687
Complete DF = 72
DF adjustment: Small sample DF: min = 64.44
avg = 64.54
max = 64.65
Model F test: Equal FMI F( 1, 64.4) = 30.16
Within VCE type: OLS Prob > F = 0.0000
------------------------------------------------------------------------------
rep78 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
foreign | 1.199738 .2184457 5.49 0.000 .7633995 1.636076
_cons | 3.054808 .1189696 25.68 0.000 2.817185 3.292431
------------------------------------------------------------------------------
The t-test reported for foreign is the MI version for the two-tailed t-test.
We also can use the returned values from -mi estimate: regress- to compute
two-tailed and one-tailed p-values:
. scalar coef_for = el(e(b_mi),1,1)
. scalar se_for = sqrt(el(e(V_mi),1,1))
. scalar df_for = el(e(df_mi),1,1)
. display 2*ttail(df_for, abs(coef_for/se_for))
7.213e-07
. display ttail(df_for, coef_for/se_for)
3.606e-07
. display ttail(df_for, -coef_for/se_for)
.99999964
Notice that in the MI framework there is a specific degrees of freedom value
for each coefficient. This is why I need to take the specific degrees of
freedom for the first coefficient from the matrix e(df_mi).
-- Isabel
icanette(at)stata(dot)com
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/