|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: multicollinearity
Variables with correlation on 0.95 may be perfectly reasonable in some problems. Stata is *absolutely correct* to leave variable selection in those situations to you. Highly collinear predictors can be diagnosed in various ways, e.g., -collin-.
I tell my students whenever they see Stata kicking out a variable due to collinearity or perfect prediction, it is THEIR job to figure out why. Chances are good they are not fitting the model they thought they were fitting. Even if the model Stata chooses is statistically equivalent to the one they wanted, surely they have information that would lead them to pick a good reference variable?
-----Original Message-----
From: "Chris Witte" <[email protected]>
To: [email protected]
Sent: 11/19/2008 8:11 PM
Subject: st: multicollinearity
I have read that -anova- and -regress- will drop variables that have collinearity problems, but I have never had Stata drop variables on me. For example:
sysuse auto
reg price headroom trunk weight length turn displacement gear_ratio
Source | SS df MS Number of obs = 74
-------------+------------------------------ F( 7, 66) = 8.18
Model | 295089440 7 42155634.3 Prob > F = 0.0000
Residual | 339975956 66 5151150.85 R-squared = 0.4647
-------------+------------------------------ Adj R-squared = 0.4079
Total | 635065396 73 8699525.97 Root MSE = 2269.6
------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
headroom | -788.1489 423.1895 -1.86 0.067 -1633.074 56.77608
trunk | 109.2235 103.9332 1.05 0.297 -98.28582 316.7328
weight | 5.300069 1.331056 3.98 0.000 2.642531 7.957607
length | -73.59571 42.42778 -1.73 0.087 -158.3055 11.11408
turn | -301.2525 124.9576 -2.41 0.019 -550.7384 -51.76669
displacement | 11.4282 7.622549 1.50 0.139 -3.790711 26.64711
gear_ratio | 2236.615 1051.394 2.13 0.037 137.4391 4335.791
_cons | 7795.908 6103.469 1.28 0.206 -4390.061 19981.88
------------------------------------------------------------------------------
and the correlation between weight and length is 0.9460. Why aren't one of these variables dropped? Does there have to be perfect correlation before dropping variables occurs?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/