Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: regression factor variables and multicollinearity
From
simone pedemonte <[email protected]>
To
[email protected]
Subject
Re: st: regression factor variables and multicollinearity
Date
Fri, 14 Feb 2014 10:35:01 +0000
Hello Maarten,
thank you for the response. The same reference categories are omitted
in both regression. The two regressions give exactly the same results,
except that Stata omits (due to collinearity) different coefficients
in the two regressions.
For example: in one regression it will omit the coefficient for
1.d1#1.d2#1.d3#5.x, and estimate the coefficient for 1.d1#1.d2#5.x,
while in the other regression it will estimate the coefficient for
1.d1#1.d2#1.d3#5.x, while omitting the coefficient for 1.d1#1.d2#5.x
(all other coefficients etc. are identical and reference categories
are the same).
My guess is that this is due to the order in which Stata estimates the
coefficients in the two regressions, so I wonder whether there is a
way to tell Stata what order to follow. Thanks again,
Simone
On Fri, Feb 14, 2014 at 10:11 AM, Maarten Buis <[email protected]> wrote:
> If you create the interactions using factor variables, then Stata
> knows that it needs to omit a refernce category for each categorical
> variable and it has a fixed rule for doing so: by default it omits the
> category with the smallest numerical value, but you can change that.
> If you create the variables yourself, it just sees that a set of
> variables and it has no a priori knowledge that it might need to omit
> one for a reference category. Only after it starts working with those
> variables it finds that there is a problem with perfect
> multicolinearity. It than knows which set of variables belong together
> in the sense that they cause perfect multicolinearity, but it cannot
> have such a standard rule of which to omit as with the factor
> variables. So I would not expect the two methods to result in the same
> reference category.
>
> Hope this helps,
> Maarten
>
> On Thu, Feb 13, 2014 at 8:27 PM, simone pedemonte
> <[email protected]> wrote:
>> Dear all,
>>
>> I am running a linear regression on three dummy variables, one
>> categorical variable, plus all their interactions.
>>
>> I understand the basic stata command to do this should be:
>>
>> reg y d1##d2##d3##x
>>
>> where x has six categories.
>>
>> I tried to create the variables myself to double check, so I generated
>> variables for the full set of interactions, and I ran the same
>> regression (omitting one the same categories in both regressions).
>>
>> I can't understand why the two regressions give me exactly the same
>> results (coefficients, std errors, etc.) except for the fact that
>> different coefficients are omitted by stata because of collinerarity.
>> Does anyone know why that could happen?
>>
>> Thank you,
>>
>> Simone Pedemonte
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
>
>
> --
> ---------------------------------
> Maarten L. Buis
> WZB
> Reichpietschufer 50
> 10785 Berlin
> Germany
>
> http://www.maartenbuis.nl
> ---------------------------------
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/