Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Direction of the effect of the cluster command on the standard error depends on the inclusion of a control variable
From
Stas Kolenikov <[email protected]>
To
[email protected]
Subject
Re: st: Direction of the effect of the cluster command on the standard error depends on the inclusion of a control variable
Date
Wed, 5 Jan 2011 19:02:08 -0600
There are terrible small sample biases exhibited by -robust- and
-cluster()- standard errors with small # of observations and clusters,
respectively. As was noted by Justina, four clusters is SO far away
from asymptotics that I wouldn't even consider the clustered standard
errors in your situation.
On Wed, Jan 5, 2011 at 6:01 PM, Jacob Felson <[email protected]> wrote:
> I wonder if anyone might be able to provide an explanation for the
> following scenario. I'm wondering why the direction of the change in
> a standard error affected by the use of the cluster command depends on
> the whether another control variable is included. My inquiry is more
> theoretical than practical, as I'm not wondering "what I should do"
> but rather, simply "why is this happening?" Let me elaborate below.
>
> Consider the following variables:
>
> y, the dependent variable
> x, the independent variable of greatest interest, which is moderately
> correlated with y and with z
> z, another independent variable, which is correlated with y at about 0.5.
>
> nation - the data was collected in 4 different nations by different
> organizations.
>
>
> I am examining the standard errors (SE) for the coefficient of
> variable x from the following four models:
>
> 1. Regress y on x, without clustering on nation.
> 2. Regress y on x, with clustering on nation.
>
> 3. Regress y on x and z without clustering on nation.
> 4. Regress y on x and z with clustering on nation.
>
>
> The SE of the coefficient for x is LARGER in model 2 than in model 1.
> This suggests there is a positive intercluster correlation. That is,
> the residuals are more similar to each other within nations than we
> would expect by chance alone. I suppose there is a preponderance of
> positive residuals in some nations and a preponderance of negative
> residuals in other nations.
>
> The SE of the coefficient for x is SMALLER in model 4 than in model 3.
> This suggests there is a negative intercluster correlation. That is,
> the residuals are less similar to each other within nations than we
> would expect by chance.
>
>
> So the effect that clustering on nation has on the SE of x depends on
> whether a third variable, z, is controlled. Why is this?
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
--
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/