on Tuesday, David wrote:
>Hi all,
>When I use ttest to test pre- vs. post-treatment means, I get one value
>for t. When I use lincom with the respondent_id as the psu, I get a
different
>value. Is the ttest command preferred, and if so, what is the difference in
>the two tests?
David is correct: one can obtain the same test statistic from -ttest-
and -lincom- following -svymean- (with option psu() specified in -svyset-).
Here is an example:
clear
set seed 1234
mat R=(1,0.5\0.5,1)
mat m=(0,10)
drawnorm sample1 sample2, n(10) corr(R) m(m)
ttest sample2=sample1
qui{
gen id=_n
reshape long sample, i(id) j(treat)
svyset, psu(id)
svymean sample, by(treat)
}
lincom [sample]2-[sample]1
Paired t test
----------------------------------------------------------------------------
--
Variable | Obs Mean Std. Err. Std. Dev. [95% Conf.
Interval]
---------+------------------------------------------------------------------
--
sample2 | 10 9.690428 .2697006 .8528683 9.080322
10.30053
sample1 | 10 -.1699037 .2851546 .9017379 -.8149681
.4751608
---------+------------------------------------------------------------------
--
diff | 10 9.860331 .3122881 .9875418 9.153886
10.56678
----------------------------------------------------------------------------
--
Ho: mean(sample2 - sample1) = mean(diff) = 0
Ha: mean(diff) < 0 Ha: mean(diff) != 0 Ha: mean(diff) > 0
t = 31.5745 t = 31.5745 t = 31.5745
P < t = 1.0000 P > |t| = 0.0000 P > t = 0.0000
. qui{
. lincom [sample]2-[sample]1
( 1) - [sample]1 + [sample]2 = 0
----------------------------------------------------------------------------
--
Mean | Estimate Std. Err. t P>|t| [95% Conf.
Interval]
-------------+--------------------------------------------------------------
--
(1) | 9.860331 .3122881 31.57 0.000 9.153886
10.56678
----------------------------------------------------------------------------
--
One reason that David did not get the same results is that he did an
unpaired -ttest-. He needs to use -reshape- in order to be able to run a
paired -ttest-.
We can look at this situation from another point of view, in terms
f -regress-:
clear
set seed 1234
mat R=(1,0.5\0.5,1)
mat m=(0,10)
drawnorm sample1 sample2, n(10) corr(R) m(m)
ttest sample2=sample1
qui{
gen id=_n
reshape long sample, i(id) j(treat)
}
xi: regress sample i.treat, cluster(id)
di in green "Mean for sample1: " in yellow _b[_cons]
di in green "Mean for sample2: " in yellow _b[_Itreat_2]+_b[_cons]
di in green "SE(_Itreat_2) w/o adj: " /*
*/in yellow _se[_Itreat_2]*sqrt((20-2)/(20-1))
Paired t test
----------------------------------------------------------------------------
--
Variable | Obs Mean Std. Err. Std. Dev. [95% Conf.
Interval]
---------+------------------------------------------------------------------
--
sample2 | 10 9.690428 .2697006 .8528683 9.080322
10.30053
sample1 | 10 -.1699037 .2851546 .9017379 -.8149681
.4751608
---------+------------------------------------------------------------------
--
diff | 10 9.860331 .3122881 .9875418 9.153886
10.56678
----------------------------------------------------------------------------
--
Ho: mean(sample2 - sample1) = mean(diff) = 0
Ha: mean(diff) < 0 Ha: mean(diff) != 0 Ha: mean(diff) > 0
t = 31.5745 t = 31.5745 t = 31.5745
P < t = 1.0000 P > |t| = 0.0000 P > t = 0.0000
. qui{
. xi: regress sample i.treat, cluster(id)
i.treat _Itreat_1-2 (naturally coded; _Itreat_1 omitted)
Regression with robust standard errors Number of obs =
20
F( 1, 9) =
944.48
Prob > F =
0.0000
R-squared =
0.9723
Number of clusters (id) = 10 Root MSE =
.87764
----------------------------------------------------------------------------
--
| Robust
sample | Coef. Std. Err. t P>|t| [95% Conf.
Interval]
-------------+--------------------------------------------------------------
--
_Itreat_2 | 9.860331 .3208456 30.73 0.000 9.134528
10.58613
_cons | -.1699037 .2929685 -0.58 0.576 -.8326443
.492837
----------------------------------------------------------------------------
--
. di in green "Mean for sample1: " in yellow _b[_cons]
Mean for sample1: -.16990366
. di in green "Mean for sample2: " in yellow _b[_Itreat_2]+_b[_cons]
Mean for sample2: 9.6904276
.di in green "SE(_Itreat_2) w/o adj: " /*
*/in yellow _se[_Itreat_2]*sqrt((20-2)/(20-1))
SE(_Itreat_2) w/o adj: .31228815
The point estimates of means are the same but we have different estimates of
standard errors. The difference is due to the multiplier sqrt((N-1)/(N-k))
used in -regress-. By adjusting SE for the coefficient of _Itreat_2 we will
get the same t-statistic as the paired ttest.
Note also that instead of using -svymean- and -lincom- combination one can
simply use
xi: svyregress sample i.treat
and will get the same test statistic as the paired -ttest-.
--Yulia
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/