Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Understanding Factor variables - is order significant ?
From
"Michael N. Mitchell" <[email protected]>
To
[email protected]
Subject
Re: st: Understanding Factor variables - is order significant ?
Date
Tue, 25 May 2010 19:19:07 -0700
Dear Richard
I think I need to use my glasses!
Yes, Richard, you are exactly on target. It relates to the use of -#- instead of -##- .
My previous answer is still true, in the sense that when you do a#b and b#a, that you
get a different reference "cell", and thus it is re-scrambling the coding.
However, doing a##b and b##a will be the same, as shown in the example below using the
auto dataset with a simple regression...
. sysuse auto
(1978 Automobile Data)
. generate bigtrunk = trunk > 15
. generate biglen = length > 190
. regress mpg bigtrunk##biglen
Source | SS df MS Number of obs = 74
-------------+------------------------------ F( 3, 70) = 23.21
Model | 1218.59972 3 406.199906 Prob > F = 0.0000
Residual | 1224.85974 70 17.4979963 R-squared = 0.4987
-------------+------------------------------ Adj R-squared = 0.4772
Total | 2443.45946 73 33.4720474 Root MSE = 4.1831
------------------------------------------------------------------------------
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.bigtrunk | -1.939394 2.52248 -0.77 0.445 -6.970323 3.091535
1.biglen | -7.806061 1.509981 -5.17 0.000 -10.81762 -4.794499
|
bigtrunk#|
biglen |
1 1 | 1.35368 2.955949 0.46 0.648 -4.541775 7.249134
|
_cons | 25.60606 .7281774 35.16 0.000 24.15376 27.05836
------------------------------------------------------------------------------
.
. regress mpg biglen##bigtrunk
Source | SS df MS Number of obs = 74
-------------+------------------------------ F( 3, 70) = 23.21
Model | 1218.59972 3 406.199906 Prob > F = 0.0000
Residual | 1224.85974 70 17.4979963 R-squared = 0.4987
-------------+------------------------------ Adj R-squared = 0.4772
Total | 2443.45946 73 33.4720474 Root MSE = 4.1831
------------------------------------------------------------------------------
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.biglen | -7.806061 1.509981 -5.17 0.000 -10.81762 -4.794499
1.bigtrunk | -1.939394 2.52248 -0.77 0.445 -6.970323 3.091535
|
biglen#|
bigtrunk |
1 1 | 1.35368 2.955949 0.46 0.648 -4.541775 7.249134
|
_cons | 25.60606 .7281774 35.16 0.000 24.15376 27.05836
------------------------------------------------------------------------------
I hope that helps,
Michael N. Mitchell
Data Management Using Stata - http://www.stata.com/bookstore/dmus.html
A Visual Guide to Stata Graphics - http://www.stata.com/bookstore/vgsg.html
Stata tidbit of the week - http://www.MichaelNormanMitchell.com
On 2010-05-25 8.06 PM, Richard Williams wrote:
At 08:32 PM 5/25/2010, Michael N. Mitchell wrote:
Extend that idea to your interaction... Suppose you flip the coding of
your "ra" and "dm" variables. Note that the test of the interaction,
the p value, will remain the same (assuming both are dummy variables).
The coefficients of "ra" and "dm" will change as well, due to the
change in coding. The details get more complicated, but are explained
in section 3.5 of
http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter3/statareg3.htm
. It is explained using the old "xi" terminology, but the issues still
are the same.
He is not changing the coding though. He is just flipping the placement
of the terms, i.e. b1.ra#b0.dm in one model and b0.dm#b1.ra. Like using
female * race versus using race * female.
I'd be curious to know if the two models did produce identical fits.
That would indicate whether the parameterizations are equivalent. If
not, then something is getting screwed up.
I suspect using ## instead of # might solve the problem -- and that
would be my preference anyway.
The following code also produces inconsistent results, with the 3rd
model being wrong. It isn't clear to me why that is the case.
use "http://www.indiana.edu/~jslsoc/stata/spex_data/ordwarm2.dta", clear
ologit warm yr89#male, nolog
ologit warm b0.male#b1.yr89, nolog
ologit warm b1.yr89#b0.male, nolog
I hate to accuse Stata of having a bug, but I am starting to wonder...
-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME: (574)289-5227
EMAIL: [email protected]
WWW: http://www.nd.edu/~rwilliam
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/