Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Multicollinearity Problem in Stata
From
FU Youyan <[email protected]>
To
"[email protected]" <[email protected]>
Subject
RE: st: Multicollinearity Problem in Stata
Date
Mon, 29 Jul 2013 21:14:52 +0100
Dear Yuval,
Thank you very much for this answer, it is quite helpful. I have a followed up question:
The r_ew and r_ow are two types of investment return in my research ( they are continuous variable rather than dummy), what I want to test is the impact of these two returns on investors' future behavior. In other words, I want to know how investors weight these two types of return. Therefore, I have to include both of the returns into my regression. In the regression with constant but omitting r_ew, the coefficient of r_ow is significantly negative (t-value=-3.30). However, in the regression without constant but including r_ew, the coefficient of r_ow is significantly positive (t-value=2.20). So, I would like to know which result is more reliable?
Best wishes,
Youyan
________________________________________
From: [email protected] [[email protected]] On Behalf Of Yuval Arbel [[email protected]]
Sent: 29 July 2013 17:58
To: statalist
Subject: Re: st: Multicollinearity Problem in Stata
Dear FU,
This outcome is not strange at all. I believe what you encountered is
known in econometrics as "the dummy variable trap":
I believe that r_ew+r_ow=constant. Consequently - when you run the
model with a constant - you get a perfect colinearity with the
constant term. But when you omit the constant - the problem is solved.
In fact you can make use of these two specifications. Consider the
following exercise. Lets say that w is the wage male=0 for female and
1 for male, and female=1 for female and 0 for male. if the average
wage is 1200 for male and 1000 for female - and you run the model
without the constant, you will get:
w(hat)=1200*male+1000*female
But if you omit male and use constant (in order to avoid the dummy
variable trap), you get
w(hat)=1200-200*female
The second specification is more common because it permits you to test
whether wage differences across gender are significant
On Mon, Jul 29, 2013 at 9:10 AM, FU Youyan <[email protected]> wrote:
> Dear Statalist users,
>
> I am encountering a strange multicollinearity problem when I conduct regression using Stata. The problem is illustrated below. I will VERY appreciate if any of you can answer my question.
>
>
> *****************************************************************************************************
> note: r_ew omitted because of collinearity
>
> Linear regression Number of obs = 159
> F( 3, 155) = 73.74
> Prob > F = 0.0000
> R-squared = 0.4900
> Root MSE = .88944
>
> ------------------------------------------------------------------------------
> | Robust
> n2_ln | Coef. Std. Err. t P>|t| [95% Conf. Interval]
> -------------+----------------------------------------------------------------
> r_ow | -6.150886 1.861984 -3.30 0.001 -9.829026 -2.472746
> r_ew | 0 (omitted)
> lnnc | .1853104 .0502188 3.69 0.000 .0861089 .2845119
> n1_ln | .2328174 .0912362 2.55 0.012 .0525905 .4130443
> _cons | 1.945399 .5489629 3.54 0.001 .8609843 3.029813
> ------------------------------------------------------------------------------
>
> In the above regression table, r_ew is omitted due to the perfectly negative collinearity between r_ow and r_ew.
>
> (Correlation table is showed below). The relationship between these two variables is r_ow+r_ew=0.2407656,so there exists perfect collinearity.
>
>
> | n2_ln r_ow r_ew lnnc n1_ln
> -------------+---------------------------------------------
> n2_ln | 1.0000
> r_ow | -0.6565 1.0000
> r_ew | 0.6565 -1.0000 1.0000
> lnnc | 0.4587 -0.4285 0.4285 1.0000
> n1_ln | 0.6419 -0.8468 0.8468 0.4103 1.0000
>
> However, the variable of r_ew is not omitted when I run the exactly same regression but without intercept.
>
>
> Linear regression Number of obs = 159
> F( 4, 155) = 441.13
> Prob > F = 0.0000
> R-squared = 0.8909
> Root MSE = .88944
>
> ------------------------------------------------------------------------------
> | Robust
> n2_ln | Coef. Std. Err. t P>|t| [95% Conf. Interval]
> -------------+----------------------------------------------------------------
> r_ow | 1.929168 .8763971 2.20 0.029 .1979442 3.660391
> r_ew | 8.080053 2.280073 3.54 0.001 3.576027 12.58408
> lnnc | .1853104 .0502188 3.69 0.000 .0861089 .2845119
> n1_ln | .2328174 .0912363 2.55 0.012 .0525905 .4130443
> ------------------------------------------------------------------------------
>
> My question is why Stata does not omit r_ew when intercept term is excluded? And whether the regression result without intercept is valid?
>
>
> Thanks for your help.
> Youyan
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
--
Dr. Yuval Arbel
School of Business
Carmel Academic Center
4 Shaar Palmer Street,
Haifa 33031, Israel
e-mail1: [email protected]
e-mail2: [email protected]
You can access my latest paper on SSRN at: http://ssrn.com/abstract=2263398
You can access previous papers on SSRN at: http://ssrn.com/author=1313670
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/