Stata
Products Purchase Support Company
Search
   >> Home >> Resources & support >> FAQs >> Estimation commands and omitted variables Bookmark and Share

Why do estimation commands sometimes omit variables?

Title   Estimation commands and omitted variables
Author James Hardin, StataCorp
Date August 1996; minor revision July 2005; updated July 2009

When you run a regression (or other estimation command) and the estimation routine omits a variable, it does so because of a dependency among the independent variables in the proposed model. You can identify this dependency by running a regression where you specify the omitted variable as the dependent variable and the remaining variables as the independent variables. Below, we generate a dependency on purpose to illustrate:

. sysuse auto
(1978 Automobile Data)

. generate newvar = price + 2.4*weight - 1.2*displ

. regress trunk price weight mpg foreign newvar displ
 note: weight omitted because of collinearity

       Source |       SS       df       MS              Number of obs =      74
 -------------+------------------------------           F(  5,    68) =   12.03
        Model |  626.913967     5  125.382793           Prob > F      =  0.0000
     Residual |  708.707655    68  10.4221714           R-squared     =  0.4694
 -------------+------------------------------           Adj R-squared =  0.4304
        Total |  1335.62162    73  18.2961866           Root MSE      =  3.2283

 ------------------------------------------------------------------------------
        trunk |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
 -------------+----------------------------------------------------------------
        price |  -.0017329   .0006706    -2.58   0.012    -.0030711   -.0003947
       weight |  (omitted)
          mpg |  -.0709254   .1125374    -0.63   0.531    -.2954903    .1536395
      foreign |   1.374419   1.287406     1.07   0.289    -1.194561    3.943399
       newvar |   .0015145   .0005881     2.58   0.012     .0003411     .002688
 displacement |    .007182   .0092692     0.77   0.441    -.0113143    .0256783
        _cons |   4.170958   5.277511     0.79   0.432    -6.360151    14.70207
 ------------------------------------------------------------------------------

The regression omitted one of the variables that was in the dependency that we created. Which variable it omits is somewhat arbitrary, but it will always omit one of the variables in the dependency. To find out what that dependency is, we can run the regression using the omitted variable as our dependent variable and the remaining independent variables from the original regression as the independent variables in this regression.

. regress weight price mpg foreign newvar displ

       Source |       SS       df       MS              Number of obs =      74
 -------------+------------------------------           F(  5,    68) =       .
        Model |  44094178.4     5  8818835.68           Prob > F      =  0.0000
     Residual |  6.9847e-07    68  1.0272e-08           R-squared     =  1.0000
 -------------+------------------------------           Adj R-squared =  1.0000
        Total |  44094178.4    73  604029.841           Root MSE      =   .0001

 ------------------------------------------------------------------------------
       weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
 -------------+----------------------------------------------------------------
        price |  -.4166667   2.11e-08 -2.0e+07   0.000    -.4166667   -.4166667
          mpg |   4.40e-06   3.53e-06     1.25   0.217    -2.65e-06    .0000115
      foreign |    .000041   .0000404     1.02   0.314    -.0000396    .0001217
       newvar |   .4166667   1.85e-08  2.3e+07   0.000     .4166667    .4166667
 displacement |   .4999999   2.91e-07  1.7e+06   0.000     .4999993    .5000005
        _cons |  -.0002082   .0001657    -1.26   0.213    -.0005388    .0001224
 ------------------------------------------------------------------------------

The regression that we ran where the omitted variable was the dependent variable has an R-squared value of 1.00 and the residual sum of squares is zero (well, nearly). Also, the coefficients of the regression show the relationship between the price, newvar, and displ variables. The output of this regression tells us that we have the dependency

weight = -.4166667*price + .4166667*newvar + .4999999*displacement 

which is equivalent to the dependency that we defined above.

FAQs
What's new?
Statistics
Data management
Graphics
Programming Stata
Mata
Resources
Internet capabilities
Stata for Windows
Stata for Unix
Stata for Mac
Technical support
Resources & support
FAQs
Technical support
NetCourses
Short courses
Users Group meetings
Statalist
Links
Software updates
Software archives
Customer service
Manuals & supplements
Stata Journal
STB
Stata News
Stata Automation
Plugins

Site overview
Products
Resources & support
Company
Site index

© Copyright 1996–2009 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index