|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: multicollinearity
Chris:
On Nov 19, 2008, at 3:06 PM, Chris Witte wrote:
On Nov 19, 2008, at 5:01 PM, Chris Witte wrote:
On Nov 19, 2008, at 8:11 PM, Chris Witte wrote:
1) The Statalist FAQ strongly suggests not posting the same message
multiple times. You may find it useful to review the FAQ at <http://
www.stata.com/support/faqs/res/statalist.html>.
Is there another way to get the following module (the link isn't
working for me)?
Example . Stata learning module on regression diagnostics:
Multicollinearity
. . . . . . . . . . . . . . . . . . UCLA Academic
Technology Services
12/03 http://www.ats.ucla.edu/stat/stata/modules/reg/
multico.htm
2) I suspect that page just doesn't exist anymore. Not too
surprising: it is from December 2003 -- almost 5 years ago, which
was also a few versions of Stata ago. If you poke around the UCLA
ATS web site, you might find related materials. Also, Google is your
friend (TM).
Also, I have read that -anova- and -regress- will drop variables
that have collinearity problems, but I have never had Stata drop
variables on me. For example:
sysuse auto
reg price headroom trunk weight length turn displacement gear_ratio
[snip]
and the correlation between weight and length is 0.9460. Why
aren't one of these variables dropped? Does there have to be
perfect correlation for dropping variables?
3) In a nutshell, yes. Multicollinearity and perfect collinearity
are not the same thing. Indeed, they are conceptually rather
different. (Your (sub)discipline may use slightly different terms
for these two concepts.) Kennedy's "A Guide to Econometrics" (for
example, the 5th edition, MIT Press, 2003) dedicates a whole chapter
to multicollinearity, and has a decent discussion of this
distinction. The explanation I have often given to my students is
that multicollinearity is a sample problem -- which in many cases
could conceptually be avoided by collecting more or "better" data --
whereas perfect collinearity is a model or specification problem --
in which no amount of additional data will resolve your specification
error. Mathematically, with perfect collinearity, the (X'X) matrix
is rank deficient and therefore not invertible: the OLS estimator
simply does not exist in this case. Stata thus drops each collinear
variable until (X'X) is of full rank, and the regression then can be
estimated on the remaining variables. Other members of Statalist
suggested to you a few synthetic examples in earlier replies.
Multicollinearity inflates variances, thereby complicating inference,
but it does not preclude estimation.
In Wooldridge's "Introductory Econometrics" textbook (for example,
pp. 102-4 of the 3rd edition, Thomson South-Western, 2006) there is a
very informative discussion of multicollinearity, which contains the
following useful insight:
"Worrying about high degrees of correlation among the independent
variables in the sample is really no different from worrying about a
small sample size: both work to increase [the variance of beta hat].
The famous University of Wisconsin econometrician Arthur Goldberger,
reacting to econometricians' obsession with multicollinearity, has
(tongue in cheek) coined the term MICRONUMEROSITY, which he defines
as the 'problem of small sample size.'"
Best,
Mike
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/