Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: should estat sd reports same sd before and after clustering?
From
Steve Samuels <[email protected]>
To
[email protected]
Subject
Re: st: should estat sd reports same sd before and after clustering?
Date
Sun, 28 Jul 2013 17:05:18 -0400
Afia:
------------------------------------------------------------------------
Intra-cluster correlation = 0.0465
------------------------------------------------------------------------
N Clusts Mean SE 95 % CI
intervention=0 38011 7.4342 0.2763 [ 6.8186, 8.0498]
intervention=1 34514 6.9507 0.2768 [ 6.3527, 7.5488]
------------------------------------------------------------------------
r(sd_1) r(sd_2) estimate the SDs that would give the same SE's if there were
no clustering
sd1 = n1^.5 x se1
sd2 = n2^.5 x se2
sd1 = (380)^.5 x .2763
sd2 = (345(^.5 x .2768
r(sd_2) = 5.141886711364611
r(sd_1) = 5.385836699859183
Steve
On Jul 28, 2013, at 3:45 PM, Afia Tasneem wrote:
Dear Steve,
Thank you for your reply. And apologies for not posting the code; I
am new to statalist.
I would be grateful if you could also answer a few follow up questions:
As you can see from the code below, standard errors with and without
clustering using svyset are almost the same (any reason for the super
slight difference?): 3.168354 and 2.756693 with clustering and
3.170342 and 2.758793 for control and intervention groups respectively
without clustering. However, the command clttest gives me different
sds before and after clustering: with clttests, my sds are 5.385 and
5.141 for control and intervention groups respectively whereas in
normal ttests, the sds are 3.170342 and 2.758793. Why do I get
different sds with svyset plus estat and clttest?
below is the code:
. svyset branch
pweight: <none>
VCE: linearized
Single unit: missing
Strata 1: <one>
SU 1: branch
FPC 1: <zero>
. svy: mean class, over(intervention)
(running mean on estimation sample)
Survey: Mean estimation
Number of strata = 1 Number of obs = 725
Number of PSUs = 25 Population size = 725
Design df = 24
control: intervention = control
intervention: intervention = intervention
--------------------------------------------------------------
| Linearized
Over | Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
class |
control | 7.434211 .3031807 6.808476 8.059945
intervention | 6.950725 .2003743 6.537172 7.364277
--------------------------------------------------------------
. estat sd
control: intervention = control
intervention: intervention = intervention
-------------------------------------
Over | Mean Std. Dev.
-------------+-----------------------
class |
control | 7.434211 3.168354
intervention | 6.950725 2.756693
-------------------------------------
. bysort intervention: sum class
-------------------------------------------------------------------------------------------------------------------------------------------
-> intervention = control
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
class | 380 7.434211 3.170342 0 12
-------------------------------------------------------------------------------------------------------------------------------------------
-> intervention = intervention
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
class | 345 6.950725 2.758793 0 12
However, when I use the command "clttest," my standard deviations do
change with clustering:
with clttests, my sds are 5.385 and 5.141 for control and intervention
groups respectively whereas in normal ttests, the sds are 3.170342 and
2.758793 for control and intervention groups respectively.
. clttest class, cluster(branch) by(intervention)
t-test adjusted for clustering
class by intervention, clustered by branch
------------------------------------------------------------------------
Intra-cluster correlation = 0.0465
------------------------------------------------------------------------
N Clusts Mean SE 95 % CI
intervention=0 38011 7.4342 0.2763 [ 6.8186, 8.0498]
intervention=1 34514 6.9507 0.2768 [ 6.3527, 7.5488]
------------------------------------------------------------------------
Combined 725 14 7.2041 0.1957 [ 6.7992, 7.6091]
------------------------------------------------------------------------
Diff(0-1) 725 25 0.4835 0.3911 [ -0.3256, 1.2926]
Degrees freedom: 23
Ho: mean(-) = mean(diff) = 0
Ha: mean(diff) < 0 Ha: mean(diff) ~= 0 Ha: mean(diff) > 0
t = 1.2362 t = 1.2362 t = 1.2362
P < t = 0.8856 P > |t| = 0.2289 P > t = 0.1144
. return list
scalars:
r(N_2) = 345
r(N_1) = 380
r(df_t) = 23
r(t) = 1.2362
r(sd_2) = 5.141886711364611
r(sd_1) = 5.385836699859183
r(se) = .3911133002996737
r(m_diff) = .4834856986999512
r(se_2) = .2768298747832084
r(se_1) = .2762875930960634
r(mu_2) = 6.950724601745606
r(mu_1) = 7.434210300445557
r(p_l) = .8855657157257124
r(p_u) = .1144342842742876
r(p) = .2288685685485752
. ttest class, by(intervention)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
control | 380 7.434211 .1626351 3.170342 7.11443 7.753991
interven | 345 6.950725 .1485284 2.758793 6.658586 7.242863
---------+--------------------------------------------------------------------
combined | 725 7.204138 .1110214 2.989343 6.986176 7.4221
---------+--------------------------------------------------------------------
diff | .4834859 .2217278 .0481787 .9187931
------------------------------------------------------------------------------
diff = mean(control) - mean(interven) t = 2.1805
Ho: diff = 0 degrees of freedom = 723
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.9852 Pr(|T| > |t|) = 0.0295 Pr(T > t) = 0.0148
Very grateful for your help.
Best regards,
Afia
On Fri, Jul 26, 2013 at 5:25 PM, Steve Samuels <[email protected]> wrote:
>
> The Statalist FAQ request that you show both your code and results. As
> you didn't, we have little idea of what you saw. I guess that your
> -svyset- didn't specify a probability weight.
>
> In that case, observations are equally weighted, and the estimated
> population standard deviation *and* mean must be identical to the sample
> versions, as given by -summarize-. Clustering, as you, noticed affects
> only standard errors. The following shows that the sd and mean are
> affected only by weighting and not by clustering.
>
>
> . sysuse auto, clear
> . gen mkr = substr(make,1,2)
>
> . svyset mkr
> . svy: mean turn
> . estat sd
> . sum turn
>
> . svyset mkr [pw = price]
> . svy: mean turn
> . estat sd
> . sum turn [aw = price]
>
> Steve
>
> On Jul 26, 2013, at 12:25 PM, Afia Tasneem wrote:
>
> Dear all,
>
> I am working on the analysis of a clustered randomized trial.
>
> My standard errors change when I svyset the data to account for
> clustering. However, the standard deviations after clustering with
> svyset and using estat sd is the same as before clustering (also the
> same as simply using: sum var). Should the sd remain unaffected with
> changes in se due to clustering? Or is the command "estat sd" not the
> right one to use to find standard deviations after clustering?
>
> Thanks much,
> Afia
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/