Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: should estat sd reports same sd before and after clustering?
From
Afia Tasneem <[email protected]>
To
[email protected]
Subject
Re: st: should estat sd reports same sd before and after clustering?
Date
Sun, 28 Jul 2013 18:03:48 -0400
Hi Steve,
I am confused. To be clear, sd's are not supposed to change with
clustering, correct? se's are supposed to change with clustering.
In a table reporting mean, sd of classes for males and females, the
difference between the two, se and p-value of the difference, where
the cluster design of the experiment is taken into account for all
numbers, what's the correct method to use (option 1 or 2 below):
Option 1:
Numbers using the following code:
svyset branch
svy: mean `var', over(intervention)
estat sd
lincom [`var']intervention - [`var']control
or
Option 2:
clttest `var', cluster(branch) by(intervention)
Many thanks,
Afia
On Sun, Jul 28, 2013 at 5:05 PM, Steve Samuels <[email protected]> wrote:
>
> Afia:
>
> ------------------------------------------------------------------------
> Intra-cluster correlation = 0.0465
> ------------------------------------------------------------------------
> N Clusts Mean SE 95 % CI
> intervention=0 38011 7.4342 0.2763 [ 6.8186, 8.0498]
> intervention=1 34514 6.9507 0.2768 [ 6.3527, 7.5488]
> ------------------------------------------------------------------------
>
>
> r(sd_1) r(sd_2) estimate the SDs that would give the same SE's if there were
> no clustering
>
> sd1 = n1^.5 x se1
> sd2 = n2^.5 x se2
>
> sd1 = (380)^.5 x .2763
> sd2 = (345(^.5 x .2768
>
> r(sd_2) = 5.141886711364611
> r(sd_1) = 5.385836699859183
>
> Steve
>
>
> On Jul 28, 2013, at 3:45 PM, Afia Tasneem wrote:
>
> Dear Steve,
>
> Thank you for your reply. And apologies for not posting the code; I
> am new to statalist.
>
> I would be grateful if you could also answer a few follow up questions:
>
> As you can see from the code below, standard errors with and without
> clustering using svyset are almost the same (any reason for the super
> slight difference?): 3.168354 and 2.756693 with clustering and
> 3.170342 and 2.758793 for control and intervention groups respectively
> without clustering. However, the command clttest gives me different
> sds before and after clustering: with clttests, my sds are 5.385 and
> 5.141 for control and intervention groups respectively whereas in
> normal ttests, the sds are 3.170342 and 2.758793. Why do I get
> different sds with svyset plus estat and clttest?
>
> below is the code:
>
> . svyset branch
>
> pweight: <none>
> VCE: linearized
> Single unit: missing
> Strata 1: <one>
> SU 1: branch
> FPC 1: <zero>
>
> . svy: mean class, over(intervention)
> (running mean on estimation sample)
>
> Survey: Mean estimation
>
> Number of strata = 1 Number of obs = 725
> Number of PSUs = 25 Population size = 725
> Design df = 24
>
> control: intervention = control
> intervention: intervention = intervention
>
> --------------------------------------------------------------
> | Linearized
> Over | Mean Std. Err. [95% Conf. Interval]
> -------------+------------------------------------------------
> class |
> control | 7.434211 .3031807 6.808476 8.059945
> intervention | 6.950725 .2003743 6.537172 7.364277
> --------------------------------------------------------------
>
> . estat sd
>
> control: intervention = control
> intervention: intervention = intervention
>
> -------------------------------------
> Over | Mean Std. Dev.
> -------------+-----------------------
> class |
> control | 7.434211 3.168354
> intervention | 6.950725 2.756693
> -------------------------------------
>
> . bysort intervention: sum class
>
> -------------------------------------------------------------------------------------------------------------------------------------------
> -> intervention = control
>
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> class | 380 7.434211 3.170342 0 12
>
> -------------------------------------------------------------------------------------------------------------------------------------------
> -> intervention = intervention
>
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> class | 345 6.950725 2.758793 0 12
>
> However, when I use the command "clttest," my standard deviations do
> change with clustering:
>
> with clttests, my sds are 5.385 and 5.141 for control and intervention
> groups respectively whereas in normal ttests, the sds are 3.170342 and
> 2.758793 for control and intervention groups respectively.
>
> . clttest class, cluster(branch) by(intervention)
>
> t-test adjusted for clustering
> class by intervention, clustered by branch
> ------------------------------------------------------------------------
> Intra-cluster correlation = 0.0465
> ------------------------------------------------------------------------
> N Clusts Mean SE 95 % CI
> intervention=0 38011 7.4342 0.2763 [ 6.8186, 8.0498]
> intervention=1 34514 6.9507 0.2768 [ 6.3527, 7.5488]
> ------------------------------------------------------------------------
> Combined 725 14 7.2041 0.1957 [ 6.7992, 7.6091]
> ------------------------------------------------------------------------
> Diff(0-1) 725 25 0.4835 0.3911 [ -0.3256, 1.2926]
>
> Degrees freedom: 23
>
> Ho: mean(-) = mean(diff) = 0
>
> Ha: mean(diff) < 0 Ha: mean(diff) ~= 0 Ha: mean(diff) > 0
> t = 1.2362 t = 1.2362 t = 1.2362
> P < t = 0.8856 P > |t| = 0.2289 P > t = 0.1144
>
> . return list
>
> scalars:
> r(N_2) = 345
> r(N_1) = 380
> r(df_t) = 23
> r(t) = 1.2362
> r(sd_2) = 5.141886711364611
> r(sd_1) = 5.385836699859183
> r(se) = .3911133002996737
> r(m_diff) = .4834856986999512
> r(se_2) = .2768298747832084
> r(se_1) = .2762875930960634
> r(mu_2) = 6.950724601745606
> r(mu_1) = 7.434210300445557
> r(p_l) = .8855657157257124
> r(p_u) = .1144342842742876
> r(p) = .2288685685485752
>
> . ttest class, by(intervention)
>
> Two-sample t test with equal variances
> ------------------------------------------------------------------------------
> Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
> ---------+--------------------------------------------------------------------
> control | 380 7.434211 .1626351 3.170342 7.11443 7.753991
> interven | 345 6.950725 .1485284 2.758793 6.658586 7.242863
> ---------+--------------------------------------------------------------------
> combined | 725 7.204138 .1110214 2.989343 6.986176 7.4221
> ---------+--------------------------------------------------------------------
> diff | .4834859 .2217278 .0481787 .9187931
> ------------------------------------------------------------------------------
> diff = mean(control) - mean(interven) t = 2.1805
> Ho: diff = 0 degrees of freedom = 723
>
> Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
> Pr(T < t) = 0.9852 Pr(|T| > |t|) = 0.0295 Pr(T > t) = 0.0148
>
> Very grateful for your help.
>
> Best regards,
> Afia
>
>
>
>
> On Fri, Jul 26, 2013 at 5:25 PM, Steve Samuels <[email protected]> wrote:
>>
>> The Statalist FAQ request that you show both your code and results. As
>> you didn't, we have little idea of what you saw. I guess that your
>> -svyset- didn't specify a probability weight.
>>
>> In that case, observations are equally weighted, and the estimated
>> population standard deviation *and* mean must be identical to the sample
>> versions, as given by -summarize-. Clustering, as you, noticed affects
>> only standard errors. The following shows that the sd and mean are
>> affected only by weighting and not by clustering.
>>
>>
>> . sysuse auto, clear
>> . gen mkr = substr(make,1,2)
>>
>> . svyset mkr
>> . svy: mean turn
>> . estat sd
>> . sum turn
>>
>> . svyset mkr [pw = price]
>> . svy: mean turn
>> . estat sd
>> . sum turn [aw = price]
>>
>> Steve
>>
>> On Jul 26, 2013, at 12:25 PM, Afia Tasneem wrote:
>>
>> Dear all,
>>
>> I am working on the analysis of a clustered randomized trial.
>>
>> My standard errors change when I svyset the data to account for
>> clustering. However, the standard deviations after clustering with
>> svyset and using estat sd is the same as before clustering (also the
>> same as simply using: sum var). Should the sd remain unaffected with
>> changes in se due to clustering? Or is the command "estat sd" not the
>> right one to use to find standard deviations after clustering?
>>
>> Thanks much,
>> Afia
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/