Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Interaction terms interpretation when one variable is omitted
From
"Mirnezami, Oliver" <[email protected]>
To
"[email protected]" <[email protected]>
Subject
st: Interaction terms interpretation when one variable is omitted
Date
Thu, 11 Apr 2013 11:14:06 +0000
Hello
I have a query regarding the interpretation of an interaction term when Stata automatically omits a variable from the regression due to collinearity.
I am looking at how job loss affects health and wish to extend my model to see when an individual loses their job, does re-employment moderate the negative effect on their health.
To do this, I have interacted my treatment variable (1 for individuals that have reported job loss in current wave, 0 for individuals employed in current wave) with an individual's labour force status.
For example:
gen treat_employed = treat * employed
gen treat_unemployed = treat * unemployed
gen treat_retired = treat * retired
In the first case, my regression is then (n.b. other controls are left out here for simplicity):
xtreg health treat employed treat_employed, fe
However, the interaction term treat_employed gets omitted. I then tried running the following regressions separately (with just 2 of 3 variables) and found that the coefficient and standard error on employed is the same as those of treat_employed (the interaction term):
| Robust
health | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+----------------------------------------------------------------
treat | -.0353416 .0370996 -0.95 0.341 -.1080636 .0373803
employed | .1540951 .0679695 2.27 0.023 .0208624 .2873278
_cons | 3.4245 .0677945 50.51 0.000 3.291611 3.55739
| Robust
sr_health1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+----------------------------------------------------------------
treat | -.1894367 .0585036 -3.24 0.001 -.3041146 -.0747589
treat_employed | .1540951 .0679695 2.27 0.023 .0208624 .2873278
_cons | 3.578596 .0007682 4658.40 0.000 3.57709 3.580101
An example of my data is as follows:
Id Year Employed Treatment Interaction term (employed * treatment)
001 1996 1 0 0
001 1998 1 0 0
001 2000 1 0 0
001 2002 0 1 0
001 2004 1 0 0
001 2006 1 0 0
001 2008 1 1 1
001 2010 1 0 0
I think the problem is arising because employment and treatment are not independent of each other in the sense that treatment always equals 0 when employed equals 1 by construction (as my control group is people with a job) although when treatment equals 1 (i.e. an individual reports job loss in this wave), the individual can be employed or unemployed (or in fact any labour force status) because the job loss would have occurred at some point between this wave and the previous interview wave and so they have already found a new job. I wish to see if health is impacted depending on which labour force status an individual has following job loss.
I thought of an alternate approach to the problem and would be grateful for your feedback. Originally, my treatment variable could equal 1 for any labour force status of the individual. My new method involves making separate treatment variables where the control groups are always the same but I have treat_emp which only equals 1 when the individual happens to be employed in the period in which job loss is reported and then treat_unemp or treat_ret if the individual happens to be unemployed or retired in the interview in which they report they have experienced job loss whereas originally it would equal 1 for all of these labour force statuses. My new method:
local stubs "emp unemp ret"
foreach stub of local stubs {
gen treat_`stub' = .
by id: replace treat_`stub' = 0 if (treat ==0)
by id: replace treat_`stub' = 1 if (treat ==1 & `stub' ==1)
}
I then run a series of separate regressions and analyse the coefficient of the treatment variables separately. I found for example that the coefficient on treat_unemp is twice as large as treat_emp which makes intuitive sense to me - can I make these comparisons across regressions in this way when the regressions are exactly the same with just a different treatment variable included in each? My thought process is that in a sense, the original treatment variable is some kind of the average of the separate treatment variables whereas now I am examining each case separately to see how they differ across separate regressions.
xtreg health treat_emp, fe
xtreg health treat_unemp, fe
xtreg health treat_ret, fe
Is this alternate method acceptable to use? I'm just concerned because previously I have always been taught to use interaction terms.
Incidentally, I found a query on interaction terms raised a few days ago by Nahla Betelmal very helpful as a starting point. David Hoaglin and Richard Williams generated a lot of discussion which was interesting to read although my query is specifically regarding when one of the variables is omitted which I don't think was covered specifically and whether my alternate approach is acceptable or should be disregarded?
I would really appreciate any advice that you can offer. Apologies for the longwinded explanation.
Kind regards
Oliver
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/