Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Re-re-post: Stata 11 - Factor variables in a regression command

From	Michael Norman Mitchell <[email protected]>
Subject	Re: Re-re-post: Stata 11 - Factor variables in a regression command
Date	Sat, 01 May 2010 10:31:05 -0700

Greetings

  Richard Williams wrote...

--- snip ---
As the original example shows, the fits produced by the first twosyntaxes are identical.
--- snip ---

  I completely agree with Richard, that

. logistic y a#b

and

. logistic y a##b
both are two different ways of parameterizing a model with twocategorical predictors. If we let factor a have A levels, and factor bhave B levels, then both models will have
  (A-1) + (B-1) + (A-1)*(B-1)
parameters in the model. In fact, this illustrates how theparameters are decomposed in a traditional parameterization (i.a i.ba#b), decomposing it into "main effect of a" (A-1 df), "main effect ofb" (B-1 df), and "a by b interaction" ( (A-1)*(B-1) df).
If, instead one specifies -a#b-, this term has (A-1) + (B-1) +(A-1)*(B-1) , and is no longer partitioned into main effect of a, maineffect of b, and interaction. The omnibus test of this effect is theoverall test of the null hypothesis that there is simultaneously nomain effect of a, no main effect of b, and no a by b interaction. As Ishow below, it simply tests the equality of means in all of the cells.I think this is rarely of research interest when one has this kind of"factorial" layout.
So, if this is what the omnibus test is doing, what about theindividual paramters. Looking at Ricardo's initial example
----------------------------------------------------------------------------
          y | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Int.]
-----------+----------------------------------------------------------------
        a#b |
       0 1  |   1.567419   .2804138     2.51   0.012     1.1038    2.2256
       1 0  |   1.447424   .2588797     2.07   0.039     1.0194    2.0551
       1 1  |   1.211988   .2246236     1.04   0.300     .84283    1.7428
----------------------------------------------------------------------------

   Note how this is much like a "oneway" layout of the data, where there are four groups, and one of the groups is an omitted group (the group a=0 b=0 is the omitted group). So, each of these parameters is testing whether the "cell" differs from the omitted cell. That is, the first parameter tests whether the cell labeled a=0 b=1 differs from the cell a=0 b=0. It is as though the design had been converted into having four groups (labled 1 2 3 4, and group 1 is the omitted group corresponding to a=0 b=0). Then, the tests compare group 2 vs. 1, group 3 vs 1, and group 4 vs. 1. The omnibus test of all the parameters, as noted above, tests the equality of all of the cell means.

   Returning to Richards point, as he notes this is just an alternative parameterization of the original model, now where each cell is compared to a reference cell. If this is the desired series of comparisons a researcher wants to make, this is a very useful and parameterization.

I hope that is useful to Ricardo, and any other readers,

Best regards,

Michael
Michael N. Mitchell
See the Stata tidbit of the week at...
http://www.MichaelNormanMitchell.com

On 2010-05-01 8.50 AM, Richard Williams wrote:
At 01:42 AM 5/1/2010, Michael Norman Mitchell wrote:
Dear Ricardo

  The command

. logistic y a#b
includes just the interaction of "a by b", and does not includethe main effect of a, nor the main effect of b. By contrast, thecommand
. logistic y a##b
includes the main effect of a, the main effect of b, as well asthe a by b interaction. It is equivalent to typing
. logistic y a#b a b
I don't think this is quite right. As the original example shows,the fits produced by the first two syntaxes are identical. So, a#band a##b are different ways of parameterizing the models. a##b givesyou the main effect of a, the main effect of b, and the interaction,i.e. it is the same as entering a, b, and a*b in the model. a*b = 1if a and b both equal 1, 0 otherwise. I believe this is equivalentto your 3rd syntax, except I would say i.a and i.b so Stata knowsthese are categorical variables.
With a#b, there are four possible combinations of values: 0 0, 0 1, 10, and 1 1. The first gets dropped and the other three are in themodel.
These are two parameterizations of the same model; personally Iprefer the a##b approach because it separates main effects frominteraction effects.
The following example illustrates the 3 different approaches, andshows the equivalence of the last 2 approaches in Michael's example:
. use "http://www.indiana.edu/~jslsoc/stata/spex_data/ordwarm2.dta";,clear
(77 & 89 General Social Survey)

. logit  warmlt2 yr89#male, nolog
Logistic regression Number of obs= 2293LR chi2(3)= 64.74Prob > chi2= 0.0000Log likelihood = -851.54241 Pseudo R2= 0.0366
------------------------------------------------------------------------------warmlt2 | Coef. Std. Err. z P>|z| [95% Conf.Interval]-------------+----------------------------------------------------------------
   yr89#male |
0 1 | .1816812 .1431068 1.27 0.204 -.098803.46216551 0 | -1.295833 .229115 -5.66 0.000 -1.74489-.84677621 1 | -.659902 .2022755 -3.26 0.001 -1.056355-.2634493
             |
_cons | -1.667376 .1021154 -16.33 0.000 -1.867518-1.467233------------------------------------------------------------------------------
. logit  warmlt2 yr89##male, nolog
Logistic regression Number of obs= 2293LR chi2(3)= 64.74Prob > chi2= 0.0000Log likelihood = -851.54241 Pseudo R2= 0.0366
------------------------------------------------------------------------------warmlt2 | Coef. Std. Err. z P>|z| [95% Conf.Interval]-------------+----------------------------------------------------------------1.yr89 | -1.295833 .229115 -5.66 0.000 -1.74489-.84677621.male | .1816812 .1431068 1.27 0.204 -.098803.4621655
             |
   yr89#male |
1 1 | .4542502 .3050139 1.49 0.136 -.14356611.052066
             |
_cons | -1.667376 .1021154 -16.33 0.000 -1.867518-1.467233------------------------------------------------------------------------------
. logit  warmlt2 i.yr89 i.male yr89#male, nolog
Logistic regression Number of obs= 2293LR chi2(3)= 64.74Prob > chi2= 0.0000Log likelihood = -851.54241 Pseudo R2= 0.0366
------------------------------------------------------------------------------warmlt2 | Coef. Std. Err. z P>|z| [95% Conf.Interval]-------------+----------------------------------------------------------------1.yr89 | -1.295833 .229115 -5.66 0.000 -1.74489-.84677621.male | .1816812 .1431068 1.27 0.204 -.098803.4621655
             |
   yr89#male |
1 1 | .4542502 .3050139 1.49 0.136 -.14356611.052066
             |
_cons | -1.667376 .1021154 -16.33 0.000 -1.867518-1.467233------------------------------------------------------------------------------
-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME:   (574)289-5227
EMAIL: [email protected]
WWW: http://www.nd.edu/~rwilliam

*
*   For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: Re-re-post: Stata 11 - Factor variables in a regression command
  - From: Richard Williams <[email protected]>

References:
- Re-re-post: Stata 11 - Factor variables in a regression command
  - From: Ricardo Basurto <[email protected]>
- Re: Re-re-post: Stata 11 - Factor variables in a regression command
  - From: Michael Norman Mitchell <[email protected]>
- Re: Re-re-post: Stata 11 - Factor variables in a regression command
  - From: Richard Williams <[email protected]>

Prev by Date: Re: st: using Stata to detect interviewer fraud
Next by Date: st: re: xtivreg
Previous by thread: Re: Re-re-post: Stata 11 - Factor variables in a regression command
Next by thread: Re: Re-re-post: Stata 11 - Factor variables in a regression command
Index(es):
- Date
- Thread