Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | vwiggins@stata.com (Vince Wiggins, StataCorp) |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Obtaining marginal effects and their standard errors after |
Date | Tue, 08 Jan 2013 17:50:12 -0600 |
Arne Risa Hole <arnehole@gmail.com> and Richard Williams <richardwilliams.ndu@gmail.com> have had an illuminating exchange about the computation and meaning of interaction effects on the probability of a positive outcome in models with a binary response. The discussion applies to any response that is not a linear combination of the coefficients, but let's stick with probabilities. I have a few related thoughts and also want to show off some of -margins- lesser known features using Arne's clever examples. Richard wonders "why margins does not provide marginal effects for interactions". We have nothing against so called "interaction effects", though as Richard notes they are a funny kind of effect. You cannot change an interaction directly, you can only change its constituent pieces. (Ergo, why "interaction effects" are so called.) You can, however, interpret an interaction, and as Arne notes, that interpretation is just the change in the slope of one variable as the other variable itself changes, d(y) interaction = ---------- d(x1)d(x2) What I will dub "own interactions", an interaction of a variable with itself, have a long history in physics. The slope of a time and distance being velocity, d(distance) velocity = ----------- d(time) and, the interaction with time itself being acceleration, d(distance) d(distance) acceleration = -------------- = ----------- d(time)d(time) d^2(time) An "own interaction" does not have the problem that we are required to think of changing the interaction itself. There is only one variable to change. Moreover, we rarely have such nice descriptions of our interactions, own or otherwise. When we regress mileage on weight and weight squared, we are simply admitting that a linear relationship doesn't match the data, and we need some flexibility in the relationship between mileage and weight. We do not think that weight squared has its own interpretation. In such cases, I am a fan of visualizing the relationships over a range of meaningful values, rather than trying to create a single number that summarizes the "interaction effect". We know that the effects differ for different levels of the interacted variables and for different levels of other variables. Best to admit this and evaluate the response at different points. As Richard points out, "the problem with any `average' number (AME or MEM) is that it disguises a great deal of individual level variability ... That is why I like MERs (marginal effects at representative values), or else APRs (a term I made up) which stands for Adjusted Predictions at Representative Values." Me too. Richard's slides on using -margins- in this context should be required reading, http://www.nd.edu/~rwilliam/stats/Margins01.pdf as should his Stata Journal article, http://www.statajournal.com/article.html?article=st0260 If you are trying to test whether an interaction term in your model is statistically significant, do that in the metric in which you estimated the model. That is to say, look at the test statistics on the interaction term. One thing to keep in mind is that with a nonlinear response (e.g., probabilities in a probit or logit model) you have in interaction effect between your covariates even when you do not have an interaction term in the model. The probability is an S-shaped response in Xb, so, as any covariate changes, it pushes the the response of the other covariates into either one of the tails, where the response is attenuated, or toward the the center, where the response is strengthened. Try this example . webuse margex . probit outcome age distance . margins, dydx(age) at(distance=(0(100)800)) . marginsplot We estimated a model with no interaction, yet when we graph the marginal effect of age over a range of distances, we find a strong downward trend in the change in probability for a change in age as distance increases. Even more fun, try this example, . clear . set seed 12345 . set obs 5000 . gen x = runiform() - .5 . gen z = runiform() - .5 . gen xb = x + 8*z . gen y = 1 / (1 + exp(xb)) < uniform() . logit y x z . margins, dydx(x) at(z=(-.5(.1).5)) . marginsplot Again, we have no interaction term in the model, but plenty of "interaction effect" on the probability. The marginal effect of x on probability traces out a nice bell-shaped curve as z increases. The marginal effect of x on probability first rises as z rises, then peaks and falls as z continues to rise. The "interaction" is pronounced, the marginal effect rising from near 0 to about .25, then falling back to 0. Despite this pronounced "interaction", if we were to compute the average "interaction effect", it would be 0 (at least asymptotically). It is 0 because the positive and negative interactions sum to 0 in this example. This is directly analygous to the well-worn example of fitting a linear model to quadratic data and finding no relationship. That is why I do not like to talk about continuous-continuous "interaction effects" as a single value. I would rather explore the MEMs or APRs. These graph are as we would expect. Logit (and probit) probabilities look like, pr = f(Xb) where f() is a monotonically increasing function of xb that asymptotes to 0 as xb -> -infinity and asymptotes to 1 as xb -> +infinity. That is to say it is an S-shaped function in Xb. If z is a covariate in the set of covariates X, then, marginal effect of z = d(pr)/d(z) = d(pr)/d(Xb) * d(Xb)/d(z) So, every marginal effect also includes a contribution from all other covariates in the model (the X in Xb). In fact d(pr)/d(Xb) will always map out the bell-shaped curve over a sufficient range of Xb. So, all logit and probit models have an interaction by construction, even when we do not introduce interaction terms. These built-in interactions from nonlinear responses lie at the heart of Ai and Norton's (2003) protracted explorations of interactions. These nonlinearities do not exist in the natural metric of the model. If we think of the response of the probit model as being a one-standard deviation change in the latent response (index value if you prefer GLM), then we have no nonlinearities, and we can directly interpret our coefficients. The case is even more compelling for logistic models, where the parameter estimates can be expressed as odds ratios that do not change as the levels of other variables change. Maarten Buis has championed this approach many times on the list, e.g., http://www.stata.com/statalist/archive/2010-08/msg00968.html with reference to an associated Stata Journal article, http://www.maartenbuis.nl/publications/interactions.html Even so, changes in probability, or another nonlinear response, can often be useful in characterizing a model. And, you say, you still want an "interaction effect" on a nonlinear response. -margins- can directly compute these effects for any number of interactions of indicator or factor-variable covariates and for interactions of those with a continuous covariates. It cannot directly compute the effects of continuous-continuous interactions. Given what we have seen above, I contend that continuous-continuous interactions are the least useful interactions and those most likely to obscure important relationships. That said, Arne has shown how to creatively use -margins- to numerically compute the pieces of a continuous-continuous interaction, and then assemble the interaction yourself. I have a simplification of Arne's example for those wanting the effects computed at the means of the covariates. Set up the dataset, and run the probit model . sysuse auto, clear . replace weight=weight/1000 . replace length=length/10 . probit foreign weight length c.weight#c.length, nolog Rather than, . margins, dydx(*) atmeans at(weight=3.019559) . matrix b = r(b) . scalar meff_turn_1 = b[1,2] . margins, dydx(*) atmeans at(weight=3.019459) . matrix b = r(b) . scalar meff_turn_0 = b[1,2] . di (meff_turn_1 - meff_turn_0) / 0.0001 you could use the -margins- contrast operator to take the difference between the marginal effect for the two values of weight, . margins, dydx(length) atmeans at(weight=3.019459) at(weight=3.019559) contrast(atcontrast(r._at)) post . margins, coeflegend . nlcom _b[r2vs1._at] / .0001 One tricky part of the -margins- command is -at(weight=3.019459) at(weight=3.019559)-. We are simply evaluating the derivative -dydx(length)- at the mean of weight and at the mean of weight plus a small epsilon, so we can numerically take the cross derivative w.r.t. weight. A second tricky part is -contrast(atcontrast(r._at))-. We are asking for the contrast (difference) in the two at() values we specified for weight. We use the -post- option of -margins- to post the results as estimation results, then use -nlcom- to divide by our epsilon. I typed -margins, coeflegend- only because we would never know that we need to refer to the estimated difference as _b[r2vs1._at] without that legend. The simplified technique has the added benefit of providing confidence intervals on the estimate. Given that we know the exact form of the probability, we would still get the most accurate results using the method described in the FAQ that led to the original question in this thread, http://www.stata.com/support/faqs/statistics/marginal-effects-after-interactions/ Although I agree with Arne that the numerical example using -margins- is mostly pedagogical, I admit that in the dark ages, before -margins- existed, I regularly performed such computations. With a little sensitivity testing of the epsilon used to compute the derivative (.0001 above), these can be accurate estimates. We can use Arne's example of a continuous-factor interaction to show how to estimate the "interaction effect" using only -margins-. I am again showing Arne's full example, because it makes clear what -margins- is computing. Set up the dataset, and run the probit model . sysuse auto, clear . set seed 12345 . generate dum=uniform()>0.5 . probit foreign turn i.dum i.dum#c.turn, nolog Rather than, . margins, dydx(*) atmeans at(dum=1) . matrix b = r(b) . scalar meff_turn_dum1 = b[1,1] . margins, dydx(*) atmeans at(dum=0) . matrix b = r(b) . scalar meff_turn_dum0 = b[1,1] . di meff_turn_dum1 - meff_turn_dum0 use -margins-' contrast operator to compute the interaction. . margins r.dum, dydx(turn) atmeans With this approach, we can remove the -atmeans- option and estimate the average "interaction effect", rather than the "interaction effect" at the means, . margins r.dum, dydx(turn) These "interaction effects" do not bother me in the same way a continous-continuous "interaction effect" does. Why? Because there are only two values for the variable dum. That means we have completely explored the interaction space of the two variables dum and turn. It does not mean that we have explored how the marginal effect of turn varies with its own values or those of other covariates in the model, and that is why I would still look at the the MERs and APRs. Factor-factor "interaction effects" can also be estimated using the contrast operators. For model, . logit A##B ... type, . margins r.A#r.B to estimate the average "interaction effect" or, to estimate the "interaction effect" at the means, type . margins r.A#r.B, atmeans This naturally extends to multiway interactions, . logit A##B##C ... . margins r.A#r.B#r.C Again, these "interaction effects" do not bother me in the way continuous-continuous interactions do. With factor variables, the interactions are exploring the complete space of results. Even so, I still like to look at the margins (estimated means), . margins A#B#C It has been my experience that the contrast operators and other contrast features added to -margins- in Stata 12 have gone largely unnoticed. I am glad Arne's examples provided the a platform to demonstrate what they do. -- Vince vwiggins@stata.com Ai, C. R. and E. C. Norton. 2003. Interaction terms in logit and probit models. Economics Letters 80(1): 123-129. Buis, Maarten L. Stata tip 87: Interpretation of interactions in nonlinear models. The Stata Journal (2010) Vol. 10 No. 2, pp. 305-308. Williams, R. Using the margins command to estimate and interpret adjusted predictions and marginal effects. The Stata Journal (2012) Vol. 12 No. 2, pp 308-331. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/