Thomas Weichle <[email protected]> is interacting a continuous variable
with a factor (indicator) variable using the '#' and '##' operators, and asks
why the model fits are different:
> I'm trying to create an interaction with a continuous variable and a
> factor variable using c. as the prefix for the continuous variable. How
> come I receive different results when using the following model syntax?
> I understand that some of the output interpretations are slightly
> different but things like the log likelihood, LR chi2, and degrees of
> freedom should be the same.
>
> Model 1:
> stcox chemo#c.income
>
> Model 2:
> stcox chemo##c.income
>
> Model 2 is equivalent to the following model:
> gen chemo_income = chemo*income
> stcox chemo income chemo_income
>
> Model 1 contains exactly 1 less degree of freedom. It is not clear to
> me why Model 1 and Model 2 aren't equivalent.
>
> I was able to demonstrate that when interacting 2 factor variables the
> following models would be equivalent:
> stcox chemo#male
> stcox chemo##male
>
> However, I'm having a hard time showing the equivalency of interacting a
> continuous and factor variable.
Let's use the auto data, interacting -foreign- (a 0-1 indicator variable) with
-turn-, so that our example somewhat lines up with Thomas'. The only other
difference is that we'll use -regress- instead of -stcox-.
. sysuse auto
. gen dt = (foreign==0) * turn
. gen ft = (foreign==1) * turn
The basic model fits are:
(1) . regress mpg for#c.turn
and
(2) . regress mpg for##c.turn
Model (1) is equivalent to
. regress mpg dt ft
and Model (2) is equivalent to
. regress mpg foreign turn dt ft
The only difference between these two models is the inclusion of the main
effect of -foreign- in Model (2). We could take -turn- out of Model (2)
without affecting the model fit because -turn- is collinear with -dt- and
-ft-; in fact since -turn, -dt-, and -ft- are collinear we can remove any one
of them from the model without affecting the model fit (we can look at the MSE
and linear predictions to verify this).
-foreign- is not collinear with any other variable in Model (1), thus
including it in Model (2) yields a difference model fit.
Now when we change the model to be the interaction between two factor
variables, such as -foreign- and -rep78-; we see that
(3) . regress mpg for#rep
and
(4) . regress mpg for##rep
yield equivalent model fits. This is because -foreign- is collinear with
the level variables in -for#rep-, and so is -rep78-.
--Jeff
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/