[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: reg w cluster std errs

From	Kit Baum <[email protected]>
To	[email protected]
Subject	st: reg w cluster std errs
Date	Mon, 17 Oct 2005 21:17:44 -0400

David said

I've run into the situation of not getting an overall F stat when running a
regression with clustered standard errors. Previous postings and Stata help
say that:
1) the number of estimated coefficients must be lower than the number of
clusters
2) each variable cannot have a non-zero value for just one observation

My final sample fulfils the above two.

I think the problem lies in having some of the dummy variables
corresponding to only one cluster. My data is clustered by 'case
investigation' (with several observations for each case) and each case falls
into a particular industry. I have industry dummy variables.

Some of these industry dummies have only one corresponding 'case
investigation'. When I get rid of these 'one case' industry dummies, I get
an F stat.

In a cluster covariance matrix estimator, you are essentially running a regression using one observation per cluster, which is why condition (1) above is important. In a standard regression, a dummy with a single 1 will essentially remove that data point from the analysis--that is, if you run the same regression without the dummy and without the observation to which it pertains, you will get the same results for the other parameters (and N-k, Root MSE, etc. will be unchanged). But the ANOVA F is messed up because it miscounts the slopes, considering that dummy to be a meaningful regressor rather than a nuisance. Try

g dum=(_n==10)
reg hours kidslt6 kidsge6 dum
reg hours kidslt6 kidsge6 if _n!=10

The clickable help for that missing F-stat when you cluster with a dummy that is only nonzero for one cluster says

Is there a regressor that is nonzero for only one observation?

The VCE you have just estimated is not of sufficient rank to perform the model test. This
can happen if there is a variable in your model that is nonzero for only a single observation
in the estimation sample. In that case the derivative of the sum-of-squares or likelihood
function with respect to that variable's parameter is zero for all observations. That
implies that the outer-product-of-gradients (OPG) variance matrix is singular. Since the OPG
variance matrix is used in computing the robust variance matrix, the latter is therefore
singular as well.

I think that StataCorp might want to expand this to include "is there a regressor which is nonzero for only one cluster when you are using the cluster option?"

Kit Baum, Boston College Economics
http://ideas.repec.org/e/pba1.html

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

Prev by Date: RE: st: Residual and influence diagnostics for conditional logistic regression
Next by Date: st: RE: Graphing RRs and their 95% CIs
Previous by thread: RE: st: Residual and influence diagnostics for conditional logistic regression
Next by thread: st: xtreg with lag
Index(es):
- Date
- Thread