Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Multicollinearity Problem in Stata


From   Yuval Arbel <[email protected]>
To   statalist <[email protected]>
Subject   Re: st: Multicollinearity Problem in Stata
Date   Mon, 29 Jul 2013 09:58:30 -0700

Dear FU,

This outcome is not strange at all. I believe what you encountered is
known in econometrics as "the dummy variable trap":

I believe that r_ew+r_ow=constant. Consequently - when you run the
model with a constant - you get a perfect colinearity with the
constant term. But when you omit the constant - the problem is solved.

In fact you can make use of these two specifications. Consider the
following exercise. Lets say that w is the wage male=0 for female and
1 for male, and female=1 for female and 0 for male. if the average
wage is 1200 for male and 1000 for female - and you run the model
without the constant, you will get:

w(hat)=1200*male+1000*female

But if you omit male and use constant (in order to avoid the dummy
variable trap), you get

w(hat)=1200-200*female

The second specification is more common because it permits you to test
whether wage differences across gender are significant

On Mon, Jul 29, 2013 at 9:10 AM, FU Youyan <[email protected]> wrote:
> Dear Statalist users,
>
> I am encountering a strange multicollinearity problem when I conduct regression using Stata. The problem is illustrated below. I will VERY appreciate if any of you can answer my question.
>
>
> *****************************************************************************************************
> note: r_ew omitted because of collinearity
>
> Linear regression                                      Number of obs =     159
>                                                        F(  3,   155) =   73.74
>                                                        Prob > F      =  0.0000
>                                                        R-squared     =  0.4900
>                                                        Root MSE      =  .88944
>
> ------------------------------------------------------------------------------
>                  |                   Robust
>        n2_ln  |      Coef.      Std. Err.          t    P>|t|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>         r_ow |  -6.150886   1.861984    -3.30   0.001    -9.829026   -2.472746
>         r_ew |          0       (omitted)
>         lnnc |   .1853104   .0502188     3.69   0.000     .0861089    .2845119
>        n1_ln |   .2328174   .0912362     2.55   0.012     .0525905    .4130443
>        _cons |   1.945399   .5489629     3.54   0.001     .8609843    3.029813
> ------------------------------------------------------------------------------
>
> In the above regression table, r_ew is omitted due to the perfectly negative collinearity between r_ow and r_ew.
>
> (Correlation table is showed below). The relationship between these two variables is r_ow+r_ew=0.2407656,so there exists perfect collinearity.
>
>
>              |       n2_ln     r_ow     r_ew       lnnc        n1_ln
> -------------+---------------------------------------------
>        n2_ln |   1.0000
>        r_ow |  -0.6565   1.0000
>        r_ew |   0.6565  -1.0000   1.0000
>        lnnc |   0.4587    -0.4285   0.4285   1.0000
>        n1_ln |   0.6419  -0.8468   0.8468   0.4103   1.0000
>
> However, the variable of r_ew is not omitted when I run the exactly same regression but without intercept.
>
>
> Linear regression                                      Number of obs =     159
>                                                        F(  4,   155) =  441.13
>                                                        Prob > F      =  0.0000
>                                                        R-squared     =  0.8909
>                                                        Root MSE      =  .88944
>
> ------------------------------------------------------------------------------
>              |                      Robust
>        n2_ln |      Coef.      Std. Err.         t          P>|t|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>         r_ow |   1.929168   .8763971     2.20   0.029     .1979442    3.660391
>         r_ew |   8.080053   2.280073     3.54   0.001     3.576027    12.58408
>         lnnc |   .1853104   .0502188     3.69   0.000     .0861089    .2845119
>        n1_ln |   .2328174   .0912363     2.55   0.012     .0525905    .4130443
> ------------------------------------------------------------------------------
>
> My question is why Stata does not omit r_ew when intercept term is excluded? And whether the regression result without intercept is valid?
>
>
> Thanks for your help.
> Youyan
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/



-- 
Dr. Yuval Arbel
School of Business
Carmel Academic Center
4 Shaar Palmer Street,
Haifa 33031, Israel
e-mail1: [email protected]
e-mail2: [email protected]
You can access my latest paper on SSRN at:  http://ssrn.com/abstract=2263398
You can access previous papers on SSRN at: http://ssrn.com/author=1313670
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index