[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Interacting continuous variable with a factor variable using c. as the prefix for the continuous variable

From	[email protected] (Jeff Pitblado, StataCorp LP)
To	[email protected]
Subject	Re: st: Interacting continuous variable with a factor variable using c. as the prefix for the continuous variable
Date	Fri, 08 Jan 2010 15:13:11 -0600

Thomas Weichle <[email protected]> is interacting a continuous variable
with a factor (indicator) variable using the '#' and '##' operators, and asks
why the model fits are different:

> I'm trying to create an interaction with a continuous variable and a
> factor variable using c. as the prefix for the continuous variable.  How
> come I receive different results when using the following model syntax?
> I understand that some of the output interpretations are slightly
> different but things like the log likelihood, LR chi2, and degrees of
> freedom should be the same.
> 
> Model 1:
> stcox chemo#c.income
> 
> Model 2:
> stcox chemo##c.income
> 
> Model 2 is equivalent to the following model:
> gen chemo_income = chemo*income
> stcox chemo income chemo_income
> 
> Model 1 contains exactly 1 less degree of freedom.  It is not clear to
> me why Model 1 and Model 2 aren't equivalent.
> 
> I was able to demonstrate that when interacting 2 factor variables the
> following models would be equivalent:
> stcox chemo#male
> stcox chemo##male
> 
> However, I'm having a hard time showing the equivalency of interacting a
> continuous and factor variable.

Let's use the auto data, interacting -foreign- (a 0-1 indicator variable) with
-turn-, so that our example somewhat lines up with Thomas'.  The only other
difference is that we'll use -regress- instead of -stcox-.

	. sysuse auto
	. gen dt = (foreign==0) * turn
	. gen ft = (foreign==1) * turn

The basic model fits are:

(1)	. regress mpg for#c.turn

and

(2)	. regress mpg for##c.turn

Model (1) is equivalent to

	. regress mpg dt ft

and Model (2) is equivalent to 

	. regress mpg foreign turn dt ft

The only difference between these two models is the inclusion of the main
effect of -foreign- in Model (2).  We could take -turn- out of Model (2)
without affecting the model fit because -turn- is collinear with -dt- and
-ft-; in fact since -turn, -dt-, and -ft- are collinear we can remove any one
of them from the model without affecting the model fit (we can look at the MSE
and linear predictions to verify this).

-foreign- is not collinear with any other variable in Model (1), thus
including it in Model (2) yields a difference model fit.

Now when we change the model to be the interaction between two factor
variables, such as -foreign- and -rep78-; we see that

(3)	. regress mpg for#rep

and

(4)	. regress mpg for##rep

yield equivalent model fits.  This is because -foreign- is collinear with
the level variables in -for#rep-, and so is -rep78-.

--Jeff
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- st: Behavior of tsrevar in Stata & Mata
  - From: "Schaffer, Mark E" <[email protected]>

Prev by Date: st: Multiple imputation for Complex Surveys
Next by Date: st: RE: RE: RE: forvalues loop shuts down when asked to "jump over" certain values
Previous by thread: st: Interacting continuous variable with a factor variable using c. as the prefix for the continuous variable
Next by thread: st: Behavior of tsrevar in Stata & Mata
Index(es):
- Date
- Thread