Thank you David, for your comments. For the most part I concur with the
ideas you express. One clarification, regarding the following comment:
Just as we shouldn't be tempted to use
stepwise methods to formulate regression models, I don't think we should
rely on
automated processes for diagnosing and solving problems of collinearity. Buzz
Burhans has indicated that "theoretical plausibility" is one of the
criteria he
used. Aside from the estimated coefficients and standard errors (or CIs),
which
alert us to the existence of the problem, I submit this is the only criterion
that should be used. (I assume that whatever procedure is followed, when
dummy
variables are involved they are excluded in whole sets corresponding to the
original variables and not discarded willy-nilly.)
I am not pursuing an automated procedure. I do think that the possibility
of collinearity associated with redundant, or more likely, partially
redundant variables is potentially real, even in well thought out
designs. Such possibility seems more likely in some exploratory analysis
situations. In any case, I agree that an automated approach is not good,
but I think a systematic approach should be used to asses and in some cases
deal with issues thus revealed. If the approach is not systematic I am
concerned that decisions about "plausible theoretical" removal made solely
on the basis of the investigators opinion of plausibity may add bias, or at
least inconsistency to the process of being informed by the data. It seems
that even when one agrees with your ideas, the need to asses and in some
cases eliminate collinearity may exist. In my case, I identified the
problem, as you suggest, by the behavior of the errors and
coefficients. In this case, at this point, obtaining more data is not a
possibility. In the future, I expect that obtaining more data in this area
may be informed by the current issues I have identified in the data I have;
but it is nonetheless worth assessing the existing data.
In any case, while I concur with your ideas, I remain interested in
possibilities for some systematic approaches to the issue of clollinear
categorical variables
Thanks again for your response, it is much appreciated.
Buzz
As an aside, which means I'm not necessarily talking about Buzz Burhans'
situation, it's been my experience that far too many "problems" are blamed on
collinearity. A parameter estimate with a large variance is not by itself a
symptom of collinearity, for example. More often than not, it indicates an
irrelevant variable has been included in the analysis -- a theoretical problem
rather than a collinearity problem. In general, misspecification errors
are far
more common than collinearity problems and should be ruled out before
suspecting
collinearity.
Dave Moore
Buzz Burhans
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/