Let's say your xtreg.fe model is properly specified. Then you can
question the i.i.d. error assumption implicit in this
regression-with-dummy-variables. Should you allow for robust s.e.?
Should you allow for cluster-robust s.e.? Should you allow for serial
correlation within units' errors? That is an issue of starting with a
valid set of point estimates and trying to get the VCE right. We want
both consistent point and interval ests., and worrying about the VCE is
not useful if the point estimates lack desirable properties.
At any rate, the basic formulation is
Y_{it} = X_{it} b + u_i + e_{ij}
where we of course have to assume everything is alright with X's, no
omitted variables or forgotten nonlinear terms, or measurement error,
or whatever. The standard assumptions are:
OLS: Var[u]=0, Var[e] = const>0, corr[X,u]=corr[X,e]=0
GLS == RE == xtreg, re: Var[u] != 0, Var[e] = const>0, corr[X,u]=corr[X,e]=0
Those are crucial in arriving at the corresponding estimates, as they
guarantee that the estimates are efficient when those conditions are
satisfied.
FE == xtreg, fe: Var[u] != 0, Var[e]=const>0, corr[X,u] does not
matter, corr[X,e]=0
The third assumption is what makes the FE attractive in being robust
against endogeneous variables. However there is still the second
assumption, which in fact can also be written down as Var[e as a
vector] = \sigma^2_e * I_T where the latter is the identity matrix.
Well the second assumption is needed for establishing of the
"standard" standard errors; you don't really need that for the point
estimate itself.
What then -cluster- correction does is it relaxes that third
assumption, implying however that there is a population to which the
sample can be generalized, and that there is some suitable sense in
which Average{ Var[e_i as a vector] } = \Sigma. One such sense that I
am relatively closely familiar with is that of the finite population
sampling, although you can probably just envision a population of
units that may have different variance-covariance matrices, but finite
higher order moments thus assuring existence of the above expectation.
It probably takes Ray Carroll... or at least James Hardin... to
clarify on that. The finite population sense has a couple of kinda
opposing implications. On one hand, it means that the covariance
matrices of the e's may vary from one panel to another, thus allowing
for say heteroskedasticity, or some unmodelled autocorrelations
(although in an inefficient way, of course). On the other hand, the
interpretation of the standard errors is that they are with respect to
the sampling design rather than model-based, as in other above
methods, i.e., they are due to the sampling process rather than due to
the randomness in u's and e's which is what other models use for
inference... and one usually would want to avoid into talking about
the design issues in empirical publications in economics.
BTW, I am not surpised that the CGM paper did not find the proposed
correction for both time and panel effect work on all occasions. In
fact, no such correction can be derived. The cluster correction works
at the highest level that encompasses all correlated observations. (A
sampling technicality here: if the first stage sampling is with
replacement... If it is without replacement, the estimator is
conservative). If you suspect that observations are correlated both
within panels and time periods, the highest encompassing level is the
whole data set. That is, you only have one effective degree of
freedom, which will be consumed to estimate the mean, and there won't
be anything left for any regression models. The proper way to analyze
the data that has both random time and panel effects is through
two-way panel data methods, of which I know nothing, except that there
is a chapter in Baltagi's book on those, and they probably assume
separability of those two effects (i.e., that Cov[ error term_{it},
error term_{js}] = Var_t(t,s) Var_u(i,j) ) -- this is highly relevant
in spatio-temporal models of which I've seen plenty while doing my
dissertation. If the process is non-separable, there are interactions
between space and time (say some diffusion process is getting slower
on longer distances), and that makes analysis a total disaster.
--
Stas Kolenikov
http://stas.kolenikov.name
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/