Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: should estat sd reports same sd before and after clustering?

From	Afia Tasneem <[email protected]>
To	[email protected]
Subject	Re: st: should estat sd reports same sd before and after clustering?
Date	Sun, 28 Jul 2013 23:44:56 -0400

Steve, very grateful for your help. Thank you.

On Sun, Jul 28, 2013 at 7:52 PM, Steve Samuels <[email protected]> wrote:
> "To be clear, sd's are not supposed to change with clustering, correct?"
> It depends on which ones. The sample & estimated population SDs do not
> change. The SD* returned by -clttest- is not the sample SD. It satisfies
> the equation:
>
> SE = SD*/sqrt(n)
>
> where the SE is from the *clustered* analysis. Some people find it useful for
> study planning or for characterizing the effect of clustering.
>
> You've added a question about which command should be used to
> compare means. If you don't have survey data, why -svyset-?
> There are non-survey options. including:
>
> -mean-, with cluster() option, followed by -lincom-
> -reg- with  cluster() option
> -clttest-
>
>
> You don't appear to have compared any of these. Do so, and you'll
> be able to answer your question yourself.
>
> By the way, you are asked to give the source of contributed
> commands like -clttest-.
>
> Steve
>
> On Jul 28, 2013, at 6:03 PM, Afia Tasneem wrote:
>
> Hi Steve,
>
> I am confused. To be clear, sd's are not supposed to change with
> clustering, correct? se's are supposed to change with clustering.
>
> In a table reporting mean, sd of classes for males and females, the
> difference between the two, se and p-value of the difference, where
> the cluster design of the experiment is taken into account for all
> numbers, what's the correct method to use (option 1 or 2 below):
>
> Option 1:
> Numbers using the following code:
> svyset branch
> svy: mean `var', over(intervention)
> estat sd
> lincom [`var']intervention - [`var']control
>
> or
> Option 2:
> clttest `var', cluster(branch) by(intervention)
>
> Many thanks,
> Afia
>
>
> On Sun, Jul 28, 2013 at 5:05 PM, Steve Samuels <[email protected]> wrote:
>>
>> Afia:
>>
>> ------------------------------------------------------------------------
>> Intra-cluster correlation         =           0.0465
>> ------------------------------------------------------------------------
>>             N    Clusts    Mean           SE             95 % CI
>> intervention=0   38011      7.4342      0.2763       [  6.8186,  8.0498]
>> intervention=1   34514      6.9507      0.2768       [  6.3527,  7.5488]
>> ------------------------------------------------------------------------
>>
>>
>> r(sd_1) r(sd_2) estimate the SDs that would give the same SE's if there were
>> no clustering
>>
>> sd1 = n1^.5 x se1
>> sd2 = n2^.5 x se2
>>
>> sd1 = (380)^.5 x .2763
>> sd2 = (345(^.5 x .2768
>>
>> r(sd_2) =  5.141886711364611
>> r(sd_1) =  5.385836699859183
>>
>> Steve
>>
>>
>> On Jul 28, 2013, at 3:45 PM, Afia Tasneem wrote:
>>
>> Dear Steve,
>>
>> Thank you for your reply.  And apologies for not posting the code; I
>> am new to statalist.
>>
>> I would be grateful if you could also answer a few follow up questions:
>>
>> As you can see from the code below, standard errors with and without
>> clustering using svyset are almost the same (any reason for the super
>> slight difference?): 3.168354 and 2.756693 with clustering and
>> 3.170342 and 2.758793 for control and intervention groups respectively
>> without clustering. However, the command clttest gives me different
>> sds before and after clustering: with clttests, my sds are 5.385 and
>> 5.141 for control and intervention groups respectively whereas in
>> normal ttests, the sds are 3.170342 and  2.758793. Why do I get
>> different sds with svyset plus estat and clttest?
>>
>> below is the code:
>>
>> . svyset branch
>>
>>     pweight: <none>
>>         VCE: linearized
>> Single unit: missing
>>    Strata 1: <one>
>>        SU 1: branch
>>       FPC 1: <zero>
>>
>> . svy: mean class, over(intervention)
>> (running mean on estimation sample)
>>
>> Survey: Mean estimation
>>
>> Number of strata =       1          Number of obs    =     725
>> Number of PSUs   =      25          Population size  =     725
>>                                   Design df        =      24
>>
>>     control: intervention = control
>> intervention: intervention = intervention
>>
>> --------------------------------------------------------------
>>            |             Linearized
>>       Over |       Mean   Std. Err.     [95% Conf. Interval]
>> -------------+------------------------------------------------
>> class        |
>>    control |   7.434211   .3031807      6.808476    8.059945
>> intervention |   6.950725   .2003743      6.537172    7.364277
>> --------------------------------------------------------------
>>
>> . estat sd
>>
>>     control: intervention = control
>> intervention: intervention = intervention
>>
>> -------------------------------------
>>       Over |       Mean   Std. Dev.
>> -------------+-----------------------
>> class        |
>>    control |   7.434211    3.168354
>> intervention |   6.950725    2.756693
>> -------------------------------------
>>
>> . bysort intervention: sum class
>>
>> -------------------------------------------------------------------------------------------------------------------------------------------
>> -> intervention = control
>>
>>   Variable |       Obs        Mean    Std. Dev.       Min        Max
>> -------------+--------------------------------------------------------
>>      class |       380    7.434211    3.170342          0         12
>>
>> -------------------------------------------------------------------------------------------------------------------------------------------
>> -> intervention = intervention
>>
>>   Variable |       Obs        Mean    Std. Dev.       Min        Max
>> -------------+--------------------------------------------------------
>>      class |       345    6.950725    2.758793          0         12
>>
>> However, when I use the command "clttest," my standard deviations do
>> change with clustering:
>>
>> with clttests, my sds are 5.385 and 5.141 for control and intervention
>> groups respectively whereas in normal ttests, the sds are 3.170342 and
>> 2.758793 for control and intervention groups respectively.
>>
>> . clttest class, cluster(branch) by(intervention)
>>
>> t-test adjusted for clustering
>> class by intervention, clustered by branch
>> ------------------------------------------------------------------------
>> Intra-cluster correlation         =           0.0465
>> ------------------------------------------------------------------------
>>             N    Clusts    Mean           SE             95 % CI
>> intervention=0   38011      7.4342      0.2763       [  6.8186,  8.0498]
>> intervention=1   34514      6.9507      0.2768       [  6.3527,  7.5488]
>> ------------------------------------------------------------------------
>> Combined    725     14      7.2041      0.1957       [  6.7992,  7.6091]
>> ------------------------------------------------------------------------
>> Diff(0-1)   725     25      0.4835      0.3911       [ -0.3256,  1.2926]
>>
>> Degrees freedom:    23
>>
>>                   Ho: mean(-) = mean(diff) = 0
>>
>> Ha: mean(diff) < 0         Ha: mean(diff) ~= 0        Ha: mean(diff) > 0
>>      t =   1.2362                t =   1.2362              t =   1.2362
>>  P < t =   0.8856          P > |t| =   0.2289          P > t =   0.1144
>>
>> . return list
>>
>> scalars:
>>               r(N_2) =  345
>>               r(N_1) =  380
>>              r(df_t) =  23
>>                 r(t) =  1.2362
>>              r(sd_2) =  5.141886711364611
>>              r(sd_1) =  5.385836699859183
>>                r(se) =  .3911133002996737
>>            r(m_diff) =  .4834856986999512
>>              r(se_2) =  .2768298747832084
>>              r(se_1) =  .2762875930960634
>>              r(mu_2) =  6.950724601745606
>>              r(mu_1) =  7.434210300445557
>>               r(p_l) =  .8855657157257124
>>               r(p_u) =  .1144342842742876
>>                 r(p) =  .2288685685485752
>>
>> . ttest class, by(intervention)
>>
>> Two-sample t test with equal variances
>> ------------------------------------------------------------------------------
>>  Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
>> ---------+--------------------------------------------------------------------
>> control |     380    7.434211    .1626351    3.170342     7.11443    7.753991
>> interven |     345    6.950725    .1485284    2.758793    6.658586    7.242863
>> ---------+--------------------------------------------------------------------
>> combined |     725    7.204138    .1110214    2.989343    6.986176      7.4221
>> ---------+--------------------------------------------------------------------
>>   diff |            .4834859    .2217278                .0481787    .9187931
>> ------------------------------------------------------------------------------
>>   diff = mean(control) - mean(interven)                         t =   2.1805
>> Ho: diff = 0                                     degrees of freedom =      723
>>
>>   Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
>> Pr(T < t) = 0.9852         Pr(|T| > |t|) = 0.0295          Pr(T > t) = 0.0148
>>
>> Very grateful for your help.
>>
>> Best regards,
>> Afia
>>
>>
>>
>>
>> On Fri, Jul 26, 2013 at 5:25 PM, Steve Samuels <[email protected]> wrote:
>>>
>>> The Statalist FAQ request that you show both your code and results. As
>>> you didn't, we have little idea of what you saw. I guess that your
>>> -svyset- didn't specify a probability weight.
>>>
>>> In that case, observations are equally weighted, and the estimated
>>> population standard deviation *and* mean must be identical to the sample
>>> versions, as given by -summarize-. Clustering, as you, noticed affects
>>> only standard errors. The following shows that the sd and mean are
>>> affected only by weighting  and not by clustering.
>>>
>>>
>>> . sysuse auto, clear
>>> . gen mkr = substr(make,1,2)
>>>
>>> . svyset mkr
>>> . svy: mean turn
>>> . estat sd
>>> . sum turn
>>>
>>> . svyset mkr [pw = price]
>>> . svy: mean turn
>>> . estat sd
>>> . sum turn [aw = price]
>>>
>>> Steve
>>>
>>> On Jul 26, 2013, at 12:25 PM, Afia Tasneem wrote:
>>>
>>> Dear all,
>>>
>>> I am working on the analysis of a clustered randomized trial.
>>>
>>> My standard errors change when I svyset the data to account for
>>> clustering. However, the standard deviations after clustering with
>>> svyset and using estat sd is the same as before clustering (also the
>>> same as simply using: sum var). Should the sd remain unaffected with
>>> changes in se due to clustering? Or is the command "estat sd" not the
>>> right one to use to find standard deviations after clustering?
>>>
>>> Thanks much,
>>> Afia
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: should estat sd reports same sd before and after clustering?
  - From: Afia Tasneem <[email protected]>
- Re: st: should estat sd reports same sd before and after clustering?
  - From: Steve Samuels <[email protected]>
- Re: st: should estat sd reports same sd before and after clustering?
  - From: Afia Tasneem <[email protected]>
- Re: st: should estat sd reports same sd before and after clustering?
  - From: Steve Samuels <[email protected]>
- Re: st: should estat sd reports same sd before and after clustering?
  - From: Afia Tasneem <[email protected]>
- Re: st: should estat sd reports same sd before and after clustering?
  - From: Steve Samuels <[email protected]>

Prev by Date: Re: st: Repeated matching of variables
Next by Date: Re: st: Passing a subvector to a void function
Previous by thread: Re: st: should estat sd reports same sd before and after clustering?
Next by thread: st: Fwd: Collinearity trouble with SVAR
Index(es):
- Date
- Thread