Why do estimation commands sometimes omit variables?
Title | |
Estimation commands and omitted variables |
Author |
James Hardin, StataCorp |
When you run a regression (or other estimation command) and the estimation
routine omits a variable, it does so because of a dependency among the
independent variables in the proposed model. You can identify this
dependency by running a regression where you specify the omitted variable as
the dependent variable and the remaining variables as the independent
variables. Below we generate a dependency on purpose to illustrate:
. sysuse auto
(1978 automobile data)
. generate newvar = price + 2.4*weight - 1.2*displ
. regress trunk price weight mpg foreign newvar displ
note: weight omitted because of collinearity.
Source | | SS df MS | Number of obs = 74 |
| | | F(5, 68) = 12.03 |
Model | | 626.913967 5 125.382793 | Prob > F = 0.0000 |
Residual | | 708.707655 68 10.4221714 | R-squared = 0.4694 |
| | | Adj R-squared = 0.4304 |
Total | | 1335.62162 73 18.2961866 | Root MSE = 3.2283 |
|
trunk | | Coefficient Std. err. t P>|t| [95% conf. interval] |
| | |
price | | -.0017329 .0006706 -2.58 0.012 -.0030711 -.0003947 |
weight | | 0 (omitted) |
mpg | | -.0709254 .1125374 -0.63 0.531 -.2954903 .1536395 |
foreign | | 1.374419 1.287406 1.07 0.289 -1.194561 3.943399 |
newvar | | .0015145 .0005881 2.58 0.012 .0003411 .002688 |
displacement | | .007182 .0092692 0.77 0.441 -.0113143 .0256783 |
_cons | | 4.170958 5.277511 0.79 0.432 -6.360151 14.70207 |
|
The regression omitted one of the variables that was in the dependency that
we created. Which variable it omits is somewhat arbitrary, but it will always
omit one of the variables in the dependency. To find out what that
dependency is, we can run the regression using the omitted variable as our
dependent variable and the remaining independent variables from the original
regression as the independent variables in this regression.
. regress weight price mpg foreign newvar displ
Source | | SS df MS | Number of obs = 74 |
| | | F(5, 68) > 99999.00 |
Model | | 44094178.4 5 8818835.68 | Prob > F = 0.0000 |
Residual | | 6.9847e-07 68 1.0272e-08 | R-squared = 1.0000 |
| | | Adj R-squared = 1.0000 |
Total | | 44094178.4 73 604029.841 | Root MSE = .0001 |
|
weight | | Coefficient Std. err. t P>|t| [95% conf. interval] |
| | |
price | | -.4166667 2.11e-08 -2.0e+07 0.000 -.4166667 -.4166667 |
mpg | | 4.40e-06 3.53e-06 1.25 0.217 -2.65e-06 .0000115 |
foreign | | .000041 .0000404 1.02 0.314 -.0000396 .0001217 |
newvar | | .4166667 1.85e-08 2.3e+07 0.000 .4166667 .4166667 |
displacement | | .4999999 2.91e-07 1.7e+06 0.000 .4999993 .5000005 |
_cons | | -.0002082 .0001657 -1.26 0.213 -.0005388 .0001224 |
|
The regression that we ran where the omitted variable was the dependent
variable has an R-squared value of 1.00 and the residual sum of squares is
zero (well, nearly). Also, the coefficients of the regression show the
relationship between the price,
newvar, and
displ variables. The output of this
regression tells us that we have the dependency
weight = -.4166667*price + .4166667*newvar + .4999999*displacement
which is equivalent to the dependency that we defined above.