|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: When number of regressors greater than the number of clusters in OLS regression
From |
Steven Samuels <[email protected]> |
To |
[email protected] |
Subject |
Re: st: When number of regressors greater than the number of clusters in OLS regression |
Date |
Mon, 1 Sep 2008 19:19:48 -0400 |
Thanks Mark. I've been thinking that the data were not *sampled* as
clusters. Since they were not, I erroneously assumed that there would
not be cluster effects. I agree clustered effects should be
considered. As Vince Wiggins stated in http://www.stata.com/statalist/
archive/2005-10/msg00594.html , "We can use the [robust] covariance
matrix to test any subset of joint hypotheses that does not exceed
its rank." Thus Divya can get valid standard errors for single
coefficients, if she adds states as clusters, and can probably make
most of the inferences she is interested in.
-xtreg- offers some intriguing possibilities, for it would
distinguish between state-level and district-level predictors of the
same kind. Of course statistics from neighboring districts may be
spatially correlated, opening up a completely different area of
analysis.
Perhaps the best advice to Divya that I can give, in addition to Mark's:
Clarify your purpose--is the study exploratory ("find a good
predictive model")? Or are you testing hypotheses about certain
predictors? If your analysis is exploratory, consider holding out a
random set of districts or states on which to test the fit of your
"best" models. If you are interested in certain predictors, than
others are potential effect modifiers and confounders. You probably
don't need them all. Do you have 25 predictors because you know they
are all important from other studies? The more unnecessary
predictors you have in one model, the more difficult it will be to
tease out the truly important ones.
-Steve
On Sep 1, 2008, at 6:00 PM, Schaffer, Mark E wrote:
Whether or not you need to use cluster-robust depends on whether you
think your data have a problem that cluster-robust can address, namely
(1) the error terms in your equation are correlated within states
because of unobserved heterogeneity (so the iid assumption fails), but
(2) the error terms are not correlated across states.
A good example would be whether you are looking at something that is
affected by state-level regulation, i.e., the laws regulating it vary
from state to state, but you don't have variables that control for
this
somehow.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/