Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Re-re-post: Stata 11 - Factor variables in a regression command
From
Michael Norman Mitchell <[email protected]>
To
[email protected]
Subject
Re: Re-re-post: Stata 11 - Factor variables in a regression command
Date
Fri, 30 Apr 2010 23:42:50 -0700
Dear Ricardo
The command
. logistic y a#b
includes just the interaction of "a by b", and does not include the
main effect of a, nor the main effect of b. By contrast, the command
. logistic y a##b
includes the main effect of a, the main effect of b, as well as the a by b interaction. It is equivalent to typing
. logistic y a#b a b
As John Fox describes in his regression book, a properly formed
regression model which contains an interaction will also include the all
lower order main effects. In other words, when including a#b, you also
include a and b. There are instances where one could omit the main
effects, but only if you know exactly why you are doing so and
understand the ramifications in terms of the intepretation of the terms
in the model.
I hope that is helpful.
Michael N. Mitchell
See the Stata tidbit of the week at...
http://www.MichaelNormanMitchell.com
On 2010-04-30 10.48 PM, Ricardo Basurto wrote:
Not the best way to start posting to StataList, is it? I am
re-arranging my message hoping that at least that way my question
won't be cut out. (If anyone has suggestions on how to successfully
submit messages from within Gmail, I would appreciate those as well.)
--------------------------------------------------------------------------------------------------------------------------------------------------------
I am having trouble understanding the difference between a regression
that uses a cross operator (#) and one that uses a cross factorial
operator (##).
For example, below is the output I get from running two different
regressions. From the log-likelihood ratio, chi2, etc, it seems clear
to me that both commands are fitting the same regression model. Also,
I can reproduce the second regression by fitting a regression with
dummies for a=1, b=1, and a variable equal to the multiplication of
those two dummies; however, I just can't figure out what exact model
is being fitted in the first regression. Can anyone explain this?
Thank you,
Ricardo
REGRESSION #1:
. logistic y a#b
Logistic regression Number of obs = 19670
LR chi2(3) = 7.71
Prob> chi2 = 0.0525
Log likelihood = -1473.1898 Pseudo R2 = 0.0026
----------------------------------------------------------------------------
y | Odds Ratio Std. Err. z P>|z| [95% Conf. Int.]
-----------+----------------------------------------------------------------
a#b |
0 1 | 1.567419 .2804138 2.51 0.012 1.1038 2.2256
1 0 | 1.447424 .2588797 2.07 0.039 1.0194 2.0551
1 1 | 1.211988 .2246236 1.04 0.300 .84283 1.7428
----------------------------------------------------------------------------
REGRESSION #2
. logistic y a##b
Logistic regression Number of obs = 19670
LR chi2(3) = 7.71
Prob> chi2 = 0.0525
Log likelihood = -1473.1898 Pseudo R2 = 0.0026
----------------------------------------------------------------------------
y | Odds Ratio Std. Err. z P>|z| [95% Conf. Int.]
-----------+----------------------------------------------------------------
1.a | 1.447424 .2588797 2.07 0.039 1.0194 2.0551
1.b | 1.567419 .2804138 2.51 0.012 1.1038 2.2256
|
a#b |
1 1 | .5342167 .1302597 -2.57 0.010 .33125 .86152
----------------------------------------------------------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/