|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: multicollinearity
Stata drops perfectly multicollinear variables. It won't drop variables that aren't perfectly collinear.
There are many silly examples one could make, e.g., a variable x and another -x both included in the regression, but if you want a slightly less obvious one take a trichtomous variable and make three dummies from it. One is redundant; it doesn't matter which.
Back in the old days multicollinearity was a big numerical problem because many cheap computing algorithms are inherently ill-conditioned and thus more unstable in the presence of collinarity than might otherwise be the case. It still is, but now that much better algorithms such as QR decompostion are used, the effect on estimates is mitigated.
The substantive problem with multicollinearity is that you can't untangle the effect of collinear variables from each other.
-----Original Message-----
From: "Chris Witte" <[email protected]>
To: [email protected]
Sent: 11/19/2008 3:06 PM
Subject: st: multicollinearity
Is there another way to get the following module (the link isn't working for me)?
Example . Stata learning module on regression diagnostics: Multicollinearity
. . . . . . . . . . . . . . . . . . UCLA Academic Technology Services
12/03 http://www.ats.ucla.edu/stat/stata/modules/reg/multico.htm
Also, I have read that -anova- and -regress- will drop variables that have collinearity problems, but I have never had Stata drop variables on me. For example:
sysuse auto
reg price headroom trunk weight length turn displacement gear_ratio
Source | SS df MS Number of obs = 74
-------------+------------------------------ F( 7, 66) = 8.18
Model | 295089440 7 42155634.3 Prob > F = 0.0000
Residual | 339975956 66 5151150.85 R-squared = 0.4647
-------------+------------------------------ Adj R-squared = 0.4079
Total | 635065396 73 8699525.97 Root MSE = 2269.6
------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
headroom | -788.1489 423.1895 -1.86 0.067 -1633.074 56.77608
trunk | 109.2235 103.9332 1.05 0.297 -98.28582 316.7328
weight | 5.300069 1.331056 3.98 0.000 2.642531 7.957607
length | -73.59571 42.42778 -1.73 0.087 -158.3055 11.11408
turn | -301.2525 124.9576 -2.41 0.019 -550.7384 -51.76669
displacement | 11.4282 7.622549 1.50 0.139 -3.790711 26.64711
gear_ratio | 2236.615 1051.394 2.13 0.037 137.4391 4335.791
_cons | 7795.908 6103.469 1.28 0.206 -4390.061 19981.88
------------------------------------------------------------------------------
and the correlation between weight and length is 0.9460. Why aren't one of these variables dropped? Does there have to be perfect correlation for dropping variables?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/