Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Clustered standard errors in -xtreg-


From   Thomas Cornelißen<[email protected]>
To   [email protected]
Subject   Re: st: Clustered standard errors in -xtreg-
Date   Thu, 28 Dec 2006 13:28:45 +0100

Thomas Cornelissen wrote:

>> I am comparing two different ways of estimating a linear fixed-effects
>> model:
>>
>> Method 1: Use -regress- and include dummy variables for the panels.
>> Method 2: Use -xtreg, fe-.
>>
>> These two deliver exactly the same estimates of coefficients and their
>> standard errors (if I do not cluster the standard errors).
>>
>> However, if I use the option -cluster- in order to get clustered
>> standard errors (clustered on the panel ID), I get different results
>> with the two ways of estimating the model.
>>
>> Why is this ?
Clive wrote:

Probably because the degrees-of-freedom correction is different in each
case. In -reg-, it's (N of obs - k variables - 1); in -reg, cluster()-,
it's (N of clusters - 1). The resultant df is often very different.

Take a look at these posts for more on this:

http://www.stata.com/statalist/archive/2004-07/msg00616.html
http://www.stata.com/statalist/archive/2004-07/msg00620.html


Note that -areg- is the same as -xtreg, fe-!
Hope that helps.


Thanks Clive!
I understand from the Stata manuals that the degrees of freedom adjustment for
the clustered covariance matrix is given by the factor:
(N-1) / (N-K) * M / (M-1)
with
M=#clusters
N= #obs.
K= #regressors

M should be the same in -reg- and -areg-, but I have the impression that
K is counted differently when in -areg- when standard errors are clustered.
(The same applies for -xtreg, fe-.)

More precisely, if I don't cluster, -areg- seems to include the absorbed regressors
into the count for K, but if I do cluster, it only counts the explicit regressors.
While in -reg- there occurs no difference when clustering or not (all regressors are explicit anyway in -reg-).
The consequence is that the estimated standard errors are the same in -reg- and -areg-
if I don't cluster but they are different if I cluster.

This is shown in the following output where I get different standard errors using -areg- and -reg-
(clustering standard errors in both cases).

This is different than in the thread Clive suggested,
http://www.stata.com/statalist/archive/2004-07/msg00620.html
where Garrett gets similar standard errors in -areg- and -reg- when clustering the standard errors
but different confidence intervals / t-test results. Was that probably based on a different version of -areg- ?

(In the following, the dummies f1-f15 correspond to the 15 categories of j.)

. reg y x1 f2- f15, cluster(j)

Linear regression Number of obs = 100
F( 0, 14) = .
Prob > F = .
R-squared = 0.6101
Number of clusters (j) = 15 Root MSE = 7.2941

------------------------------------------------------------------------------
| Robust
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | 1.137686 .241541 4.71 0.000 .6196322 1.65574
f2 | 5.545925 .3450585 16.07 0.000 4.805848 6.286002
f3 | 2.58378 .1509631 17.12 0.000 2.259996 2.907563
f4 | 15.3432 .3220546 47.64 0.000 14.65246 16.03393
f5 | 12.46324 .2683788 46.44 0.000 11.88762 13.03885
f6 | 2.81987 .0483082 58.37 0.000 2.71626 2.923481
f7 | 13.17254 .5434672 24.24 0.000 12.00692 14.33816
f8 | 10.3462 .6642376 15.58 0.000 8.921549 11.77084
f9 | 11.5064 1.207705 9.53 0.000 8.916134 14.09667
f10 | -5.803007 .507236 -11.44 0.000 -6.89092 -4.715094
f11 | 12.73337 .0268379 474.45 0.000 12.67581 12.79093
f12 | 5.960424 .5313901 11.22 0.000 4.820706 7.100143
f13 | 19.27186 .5175878 37.23 0.000 18.16175 20.38198
f14 | 10.34177 .2787011 37.11 0.000 9.744018 10.93953
f15 | 25.99612 .1449246 179.38 0.000 25.68529 26.30695
_cons | -11.55165 .241541 -47.82 0.000 -12.0697 -11.03359
------------------------------------------------------------------------------

. areg y x1, absorb(j) cluster(j)

Linear regression, absorbing indicators Number of obs = 100
F( 1, 14) = 25.88
Prob > F = 0.0002
R-squared = 0.6101
Adj R-squared = 0.6061
Root MSE = 7.2941

(Std. Err. adjusted for 15 clusters in j)
------------------------------------------------------------------------------
| Robust
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | 1.137686 .2236235 5.09 0.000 .6580614 1.617311
_cons | -2.28529 .0715595 -31.94 0.000 -2.438769 -2.13181
-------------+----------------------------------------------------------------
j | absorbed (15 categories)


Note that the standard errors on the coefficient of x1 differ in the two regressions.

I count 16 regressors in -regress-, and 2 explicit regressors in -areg-.
E.g. N-K in -regress- is 84 while in -areg- it would be 98 if the
absorbed regressors are not counted.

I manage to transform the standard errors into one another using these different values for
N-K:
. di .2236235 *sqrt(98/84)
.24154099

That's why I think that for computing the standard errors, -areg- / -xtreg- does not
count the absorbed regressors for computing N-K when standard errors are clustered.

However, when I do not cluster, standard errors are exactly the same:

. reg y x1 f2- f15

Source | SS df MS Number of obs = 100
-------------+------------------------------ F( 15, 84) = 8.76
Model | 6993.20799 15 466.213866 Prob > F = 0.0000
Residual | 4469.17468 84 53.2044604 R-squared = 0.6101
-------------+------------------------------ Adj R-squared = 0.5405
Total | 11462.3827 99 115.781643 Root MSE = 7.2941

------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | 1.137686 .2679358 4.25 0.000 .6048663 1.670506
...
(output omitted)
------------------------------------------------------------------------------

. areg y x1, absorb(j)

Linear regression, absorbing indicators Number of obs = 100
F( 1, 84) = 18.03
Prob > F = 0.0001
R-squared = 0.6101
Adj R-squared = 0.5405
Root MSE = 7.2941

------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | 1.137686 .2679358 4.25 0.000 .6048663 1.670506
_cons | -2.28529 .7344357 -3.11 0.003 -3.745796 -.8247835
-------------+----------------------------------------------------------------
j | F(14, 84) = 8.012 0.000 (15 categories)


So in that case, -areg- does seem to take the absorbed regressors into account
when computing N-K.

Is there a rationale for not counting the absorbed regressors when standard errors are clustered ?

Haven't degrees of freedom been used for absorbing the variables and therefore the absorbed
regressors should always be counted as well?
Thomas
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index