Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Understanding Factor variables - is order significant ?
From
"Michael N. Mitchell" <[email protected]>
To
[email protected]
Subject
Re: st: Understanding Factor variables - is order significant ?
Date
Tue, 25 May 2010 20:50:35 -0700
Dear Richard
I now see what you are talking about! I am confused by this as well. So, I switched to
a more sensible dataset for poisson regression, using the UCLA ATS example, as shown below...
. use http://www.ats.ucla.edu/stat/stata/dae/poissonreg, clear
(Two Los Angeles High Schools)
. gen himath = math > 50
Using the term -ib1.himath#ib0.male-, it is as though four groups are entered, with the
group labeled himath=1 male=0 as the reference group.
. regress daysabs ib1.himath#ib0.male
Source | SS df MS Number of obs = 316
-------------+------------------------------ F( 3, 312) = 7.07
Model | 1112.497 3 370.832335 Prob > F = 0.0001
Residual | 16366.1106 312 52.4554827 R-squared = 0.0636
-------------+------------------------------ Adj R-squared = 0.0546
Total | 17478.6076 315 55.4876432 Root MSE = 7.2426
------------------------------------------------------------------------------
daysabs | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
himath#male |
0 0 | 3.062032 1.139457 2.69 0.008 .8200395 5.304025
0 1 | 1.491369 1.159842 1.29 0.199 -.7907317 3.77347
1 1 | -2.010909 1.175009 -1.71 0.088 -4.322853 .3010348
|
_cons | 5.090909 .8253727 6.17 0.000 3.466909 6.714909
------------------------------------------------------------------------------
So here I reproduce the results, explicitly entering the four groups (and omitting
group 3 via ib3.group) and we can see the results are the same...
. generate group = .
(316 missing values generated)
. replace group = 1 if himath==0 & male==0
(85 real changes made)
. replace group = 2 if himath==0 & male==1
(79 real changes made)
. replace group = 3 if himath==1 & male==0
(77 real changes made)
. replace group = 4 if himath==1 & male==1
(75 real changes made)
. regress daysabs ib3.group
Source | SS df MS Number of obs = 316
-------------+------------------------------ F( 3, 312) = 7.07
Model | 1112.497 3 370.832335 Prob > F = 0.0001
Residual | 16366.1106 312 52.4554827 R-squared = 0.0636
-------------+------------------------------ Adj R-squared = 0.0546
Total | 17478.6076 315 55.4876432 Root MSE = 7.2426
------------------------------------------------------------------------------
daysabs | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
group |
1 | 3.062032 1.139457 2.69 0.008 .8200395 5.304025
2 | 1.491369 1.159842 1.29 0.199 -.7907317 3.77347
4 | -2.010909 1.175009 -1.71 0.088 -4.322853 .3010348
|
_cons | 5.090909 .8253727 6.17 0.000 3.466909 6.714909
------------------------------------------------------------------------------
So, as you suggest, let's try this using a -poisson- model. So, here is the result
using -ib1.himath#ib0.male- . The coding still leaves the group labeled himath=1 male=0 as
the reference group. But, the results include a coefficient for himath=0 and male=0 that
has no standard error. Does this occur when using -group-???
. poisson daysabs ib1.himath#ib0.male
Iteration 0: log likelihood = -1600.2092
Iteration 1: log likelihood = -1564.0399
Iteration 2: log likelihood = -1563.1148
Iteration 3: log likelihood = -1563.1144
Iteration 4: log likelihood = -1563.1144
Poisson regression Number of obs = 316
LR chi2(2) = 144.99
Prob > chi2 = 0.0000
Log likelihood = -1563.1144 Pseudo R2 = 0.0443
------------------------------------------------------------------------------
daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
himath#male |
0 0 | .4701202 . . . . .
0 1 | -.017358 .0533361 -0.33 0.745 -.1218947 .0871788
1 1 | -.7768093 .0724615 -10.72 0.000 -.9188312 -.6347875
|
_cons | 1.901739 .0303588 62.64 0.000 1.842237 1.961241
------------------------------------------------------------------------------
Here are the results now for the poisson model, explicitly entering the four groups
(and omitting group 3 via ib3.group) and we can see the results are very different from above.
. poisson daysabs ib3.group
Iteration 0: log likelihood = -1534.3667
Iteration 1: log likelihood = -1534.3618
Iteration 2: log likelihood = -1534.3618
Poisson regression Number of obs = 316
LR chi2(3) = 202.49
Prob > chi2 = 0.0000
Log likelihood = -1534.3618 Pseudo R2 = 0.0619
------------------------------------------------------------------------------
daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
group |
1 | .4709223 .0631983 7.45 0.000 .347056 .5947887
2 | .2569245 .0668887 3.84 0.000 .1258251 .388024
4 | -.5025268 .0829459 -6.06 0.000 -.6650978 -.3399558
|
_cons | 1.627456 .0505076 32.22 0.000 1.528463 1.72645
------------------------------------------------------------------------------
This is a perplexing state of affairs! I don't know how to explain this!
I hope someone can help explain!
Michael N. Mitchell
Data Management Using Stata - http://www.stata.com/bookstore/dmus.html
A Visual Guide to Stata Graphics - http://www.stata.com/bookstore/vgsg.html
Stata tidbit of the week - http://www.MichaelNormanMitchell.com
On 2010-05-25 8.43 PM, Richard Williams wrote:
At 09:19 PM 5/25/2010, Michael N. Mitchell wrote:
Dear Richard
I think I need to use my glasses!
Yes, Richard, you are exactly on target. It relates to the use of -#-
instead of -##- .
My previous answer is still true, in the sense that when you do a#b
and b#a, that you get a different reference "cell", and thus it is
re-scrambling the coding.
I agree with everything you say. But, it still isn't clear to me why it
should make a difference whether you use b1.ra#b0.dm versus b0.dm#b1.ra.
Tweaking your example, the following yield identical or equivalent
results for regress but not for poisson:
sysuse auto, clear
generate bigtrunk = trunk > 15
generate biglen = length > 190
regress mpg bigtrunk#biglen
regress mpg b1.bigtrunk#b0.biglen
regress mpg b0.biglen#b1.bigtrunk
poisson mpg bigtrunk#biglen, nolog
poisson mpg b1.bigtrunk#b0.biglen, nolog
poisson mpg b0.biglen#b1.bigtrunk, nolog
Use of ## in the last 2 commands avoids the problem, but why is there a
problem in the first place?
-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME: (574)289-5227
EMAIL: [email protected]
WWW: http://www.nd.edu/~rwilliam
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/