Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: should estat sd reports same sd before and after clustering?
From
Afia Tasneem <[email protected]>
To
[email protected]
Subject
Re: st: should estat sd reports same sd before and after clustering?
Date
Sun, 28 Jul 2013 23:44:56 -0400
Steve, very grateful for your help. Thank you.
On Sun, Jul 28, 2013 at 7:52 PM, Steve Samuels <[email protected]> wrote:
> "To be clear, sd's are not supposed to change with clustering, correct?"
> It depends on which ones. The sample & estimated population SDs do not
> change. The SD* returned by -clttest- is not the sample SD. It satisfies
> the equation:
>
> SE = SD*/sqrt(n)
>
> where the SE is from the *clustered* analysis. Some people find it useful for
> study planning or for characterizing the effect of clustering.
>
> You've added a question about which command should be used to
> compare means. If you don't have survey data, why -svyset-?
> There are non-survey options. including:
>
> -mean-, with cluster() option, followed by -lincom-
> -reg- with cluster() option
> -clttest-
>
>
> You don't appear to have compared any of these. Do so, and you'll
> be able to answer your question yourself.
>
> By the way, you are asked to give the source of contributed
> commands like -clttest-.
>
> Steve
>
> On Jul 28, 2013, at 6:03 PM, Afia Tasneem wrote:
>
> Hi Steve,
>
> I am confused. To be clear, sd's are not supposed to change with
> clustering, correct? se's are supposed to change with clustering.
>
> In a table reporting mean, sd of classes for males and females, the
> difference between the two, se and p-value of the difference, where
> the cluster design of the experiment is taken into account for all
> numbers, what's the correct method to use (option 1 or 2 below):
>
> Option 1:
> Numbers using the following code:
> svyset branch
> svy: mean `var', over(intervention)
> estat sd
> lincom [`var']intervention - [`var']control
>
> or
> Option 2:
> clttest `var', cluster(branch) by(intervention)
>
> Many thanks,
> Afia
>
>
> On Sun, Jul 28, 2013 at 5:05 PM, Steve Samuels <[email protected]> wrote:
>>
>> Afia:
>>
>> ------------------------------------------------------------------------
>> Intra-cluster correlation = 0.0465
>> ------------------------------------------------------------------------
>> N Clusts Mean SE 95 % CI
>> intervention=0 38011 7.4342 0.2763 [ 6.8186, 8.0498]
>> intervention=1 34514 6.9507 0.2768 [ 6.3527, 7.5488]
>> ------------------------------------------------------------------------
>>
>>
>> r(sd_1) r(sd_2) estimate the SDs that would give the same SE's if there were
>> no clustering
>>
>> sd1 = n1^.5 x se1
>> sd2 = n2^.5 x se2
>>
>> sd1 = (380)^.5 x .2763
>> sd2 = (345(^.5 x .2768
>>
>> r(sd_2) = 5.141886711364611
>> r(sd_1) = 5.385836699859183
>>
>> Steve
>>
>>
>> On Jul 28, 2013, at 3:45 PM, Afia Tasneem wrote:
>>
>> Dear Steve,
>>
>> Thank you for your reply. And apologies for not posting the code; I
>> am new to statalist.
>>
>> I would be grateful if you could also answer a few follow up questions:
>>
>> As you can see from the code below, standard errors with and without
>> clustering using svyset are almost the same (any reason for the super
>> slight difference?): 3.168354 and 2.756693 with clustering and
>> 3.170342 and 2.758793 for control and intervention groups respectively
>> without clustering. However, the command clttest gives me different
>> sds before and after clustering: with clttests, my sds are 5.385 and
>> 5.141 for control and intervention groups respectively whereas in
>> normal ttests, the sds are 3.170342 and 2.758793. Why do I get
>> different sds with svyset plus estat and clttest?
>>
>> below is the code:
>>
>> . svyset branch
>>
>> pweight: <none>
>> VCE: linearized
>> Single unit: missing
>> Strata 1: <one>
>> SU 1: branch
>> FPC 1: <zero>
>>
>> . svy: mean class, over(intervention)
>> (running mean on estimation sample)
>>
>> Survey: Mean estimation
>>
>> Number of strata = 1 Number of obs = 725
>> Number of PSUs = 25 Population size = 725
>> Design df = 24
>>
>> control: intervention = control
>> intervention: intervention = intervention
>>
>> --------------------------------------------------------------
>> | Linearized
>> Over | Mean Std. Err. [95% Conf. Interval]
>> -------------+------------------------------------------------
>> class |
>> control | 7.434211 .3031807 6.808476 8.059945
>> intervention | 6.950725 .2003743 6.537172 7.364277
>> --------------------------------------------------------------
>>
>> . estat sd
>>
>> control: intervention = control
>> intervention: intervention = intervention
>>
>> -------------------------------------
>> Over | Mean Std. Dev.
>> -------------+-----------------------
>> class |
>> control | 7.434211 3.168354
>> intervention | 6.950725 2.756693
>> -------------------------------------
>>
>> . bysort intervention: sum class
>>
>> -------------------------------------------------------------------------------------------------------------------------------------------
>> -> intervention = control
>>
>> Variable | Obs Mean Std. Dev. Min Max
>> -------------+--------------------------------------------------------
>> class | 380 7.434211 3.170342 0 12
>>
>> -------------------------------------------------------------------------------------------------------------------------------------------
>> -> intervention = intervention
>>
>> Variable | Obs Mean Std. Dev. Min Max
>> -------------+--------------------------------------------------------
>> class | 345 6.950725 2.758793 0 12
>>
>> However, when I use the command "clttest," my standard deviations do
>> change with clustering:
>>
>> with clttests, my sds are 5.385 and 5.141 for control and intervention
>> groups respectively whereas in normal ttests, the sds are 3.170342 and
>> 2.758793 for control and intervention groups respectively.
>>
>> . clttest class, cluster(branch) by(intervention)
>>
>> t-test adjusted for clustering
>> class by intervention, clustered by branch
>> ------------------------------------------------------------------------
>> Intra-cluster correlation = 0.0465
>> ------------------------------------------------------------------------
>> N Clusts Mean SE 95 % CI
>> intervention=0 38011 7.4342 0.2763 [ 6.8186, 8.0498]
>> intervention=1 34514 6.9507 0.2768 [ 6.3527, 7.5488]
>> ------------------------------------------------------------------------
>> Combined 725 14 7.2041 0.1957 [ 6.7992, 7.6091]
>> ------------------------------------------------------------------------
>> Diff(0-1) 725 25 0.4835 0.3911 [ -0.3256, 1.2926]
>>
>> Degrees freedom: 23
>>
>> Ho: mean(-) = mean(diff) = 0
>>
>> Ha: mean(diff) < 0 Ha: mean(diff) ~= 0 Ha: mean(diff) > 0
>> t = 1.2362 t = 1.2362 t = 1.2362
>> P < t = 0.8856 P > |t| = 0.2289 P > t = 0.1144
>>
>> . return list
>>
>> scalars:
>> r(N_2) = 345
>> r(N_1) = 380
>> r(df_t) = 23
>> r(t) = 1.2362
>> r(sd_2) = 5.141886711364611
>> r(sd_1) = 5.385836699859183
>> r(se) = .3911133002996737
>> r(m_diff) = .4834856986999512
>> r(se_2) = .2768298747832084
>> r(se_1) = .2762875930960634
>> r(mu_2) = 6.950724601745606
>> r(mu_1) = 7.434210300445557
>> r(p_l) = .8855657157257124
>> r(p_u) = .1144342842742876
>> r(p) = .2288685685485752
>>
>> . ttest class, by(intervention)
>>
>> Two-sample t test with equal variances
>> ------------------------------------------------------------------------------
>> Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
>> ---------+--------------------------------------------------------------------
>> control | 380 7.434211 .1626351 3.170342 7.11443 7.753991
>> interven | 345 6.950725 .1485284 2.758793 6.658586 7.242863
>> ---------+--------------------------------------------------------------------
>> combined | 725 7.204138 .1110214 2.989343 6.986176 7.4221
>> ---------+--------------------------------------------------------------------
>> diff | .4834859 .2217278 .0481787 .9187931
>> ------------------------------------------------------------------------------
>> diff = mean(control) - mean(interven) t = 2.1805
>> Ho: diff = 0 degrees of freedom = 723
>>
>> Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
>> Pr(T < t) = 0.9852 Pr(|T| > |t|) = 0.0295 Pr(T > t) = 0.0148
>>
>> Very grateful for your help.
>>
>> Best regards,
>> Afia
>>
>>
>>
>>
>> On Fri, Jul 26, 2013 at 5:25 PM, Steve Samuels <[email protected]> wrote:
>>>
>>> The Statalist FAQ request that you show both your code and results. As
>>> you didn't, we have little idea of what you saw. I guess that your
>>> -svyset- didn't specify a probability weight.
>>>
>>> In that case, observations are equally weighted, and the estimated
>>> population standard deviation *and* mean must be identical to the sample
>>> versions, as given by -summarize-. Clustering, as you, noticed affects
>>> only standard errors. The following shows that the sd and mean are
>>> affected only by weighting and not by clustering.
>>>
>>>
>>> . sysuse auto, clear
>>> . gen mkr = substr(make,1,2)
>>>
>>> . svyset mkr
>>> . svy: mean turn
>>> . estat sd
>>> . sum turn
>>>
>>> . svyset mkr [pw = price]
>>> . svy: mean turn
>>> . estat sd
>>> . sum turn [aw = price]
>>>
>>> Steve
>>>
>>> On Jul 26, 2013, at 12:25 PM, Afia Tasneem wrote:
>>>
>>> Dear all,
>>>
>>> I am working on the analysis of a clustered randomized trial.
>>>
>>> My standard errors change when I svyset the data to account for
>>> clustering. However, the standard deviations after clustering with
>>> svyset and using estat sd is the same as before clustering (also the
>>> same as simply using: sum var). Should the sd remain unaffected with
>>> changes in se due to clustering? Or is the command "estat sd" not the
>>> right one to use to find standard deviations after clustering?
>>>
>>> Thanks much,
>>> Afia
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>>
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/