Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Obtaining marginal effects and their standard errors after
From
Arne Risa Hole <[email protected]>
To
[email protected]
Subject
Re: st: Obtaining marginal effects and their standard errors after
Date
Wed, 9 Jan 2013 10:43:35 +0000
Dear Vince,
Thanks for posting this. I found it very illuminating, in particular
the clever uses of the contrast features of -margins-.
Best wishes,
Arne
On 8 January 2013 23:50, Vince Wiggins, StataCorp <[email protected]> wrote:
> Arne Risa Hole <[email protected]> and Richard Williams
> <[email protected]> have had an illuminating exchange about
> the computation and meaning of interaction effects on the probability
> of a positive outcome in models with a binary response. The discussion
> applies to any response that is not a linear combination of the
> coefficients, but let's stick with probabilities. I have a few related
> thoughts and also want to show off some of -margins- lesser known
> features using Arne's clever examples.
>
> Richard wonders "why margins does not provide marginal effects for
> interactions". We have nothing against so called "interaction
> effects", though as Richard notes they are a funny kind of effect. You
> cannot change an interaction directly, you can only change its
> constituent pieces. (Ergo, why "interaction effects" are so called.)
> You can, however, interpret an interaction, and as Arne notes, that
> interpretation is just the change in the slope of one variable as the
> other variable itself changes,
>
> d(y)
> interaction = ----------
> d(x1)d(x2)
>
> What I will dub "own interactions", an interaction of a variable with
> itself, have a long history in physics. The slope of a time and
> distance being velocity,
>
> d(distance)
> velocity = -----------
> d(time)
>
> and, the interaction with time itself being acceleration,
>
> d(distance) d(distance)
> acceleration = -------------- = -----------
> d(time)d(time) d^2(time)
>
> An "own interaction" does not have the problem that we are required to
> think of changing the interaction itself. There is only one variable
> to change. Moreover, we rarely have such nice descriptions of our
> interactions, own or otherwise. When we regress mileage on weight and
> weight squared, we are simply admitting that a linear relationship
> doesn't match the data, and we need some flexibility in the
> relationship between mileage and weight. We do not think that weight
> squared has its own interpretation.
>
> In such cases, I am a fan of visualizing the relationships over a range
> of meaningful values, rather than trying to create a single number that
> summarizes the "interaction effect". We know that the effects differ
> for different levels of the interacted variables and for different
> levels of other variables. Best to admit this and evaluate the
> response at different points. As Richard points out, "the problem with
> any `average' number (AME or MEM) is that it disguises a great deal of
> individual level variability ... That is why I like MERs (marginal
> effects at representative values), or else APRs (a term I made up)
> which stands for Adjusted Predictions at Representative Values." Me
> too.
>
> Richard's slides on using -margins- in this context should be required
> reading,
>
> http://www.nd.edu/~rwilliam/stats/Margins01.pdf
>
> as should his Stata Journal article,
>
> http://www.statajournal.com/article.html?article=st0260
>
> If you are trying to test whether an interaction term in your model is
> statistically significant, do that in the metric in which you estimated
> the model. That is to say, look at the test statistics on the
> interaction term.
>
> One thing to keep in mind is that with a nonlinear response (e.g.,
> probabilities in a probit or logit model) you have in interaction
> effect between your covariates even when you do not have an interaction
> term in the model. The probability is an S-shaped response in Xb, so,
> as any covariate changes, it pushes the the response of the other
> covariates into either one of the tails, where the response is
> attenuated, or toward the the center, where the response is
> strengthened.
>
> Try this example
>
> . webuse margex
> . probit outcome age distance
> . margins, dydx(age) at(distance=(0(100)800))
> . marginsplot
>
> We estimated a model with no interaction, yet when we graph the
> marginal effect of age over a range of distances, we find a strong
> downward trend in the change in probability for a change in age as
> distance increases.
>
> Even more fun, try this example,
>
> . clear
> . set seed 12345
> . set obs 5000
>
> . gen x = runiform() - .5
> . gen z = runiform() - .5
> . gen xb = x + 8*z
> . gen y = 1 / (1 + exp(xb)) < uniform()
>
> . logit y x z
>
> . margins, dydx(x) at(z=(-.5(.1).5))
> . marginsplot
>
> Again, we have no interaction term in the model, but plenty of
> "interaction effect" on the probability. The marginal effect of x on
> probability traces out a nice bell-shaped curve as z increases. The
> marginal effect of x on probability first rises as z rises, then peaks
> and falls as z continues to rise. The "interaction" is pronounced, the
> marginal effect rising from near 0 to about .25, then falling back to
> 0.
>
> Despite this pronounced "interaction", if we were to compute the
> average "interaction effect", it would be 0 (at least asymptotically).
> It is 0 because the positive and negative interactions sum to 0 in this
> example. This is directly analygous to the well-worn example of
> fitting a linear model to quadratic data and finding no relationship.
> That is why I do not like to talk about continuous-continuous
> "interaction effects" as a single value. I would rather explore the
> MEMs or APRs.
>
> These graph are as we would expect. Logit (and probit) probabilities
> look like,
>
> pr = f(Xb)
>
> where f() is a monotonically increasing function of xb that asymptotes
> to 0 as xb -> -infinity and asymptotes to 1 as xb -> +infinity. That
> is to say it is an S-shaped function in Xb.
>
> If z is a covariate in the set of covariates X, then,
>
> marginal effect of z = d(pr)/d(z) = d(pr)/d(Xb) * d(Xb)/d(z)
>
> So, every marginal effect also includes a contribution from all
> other covariates in the model (the X in Xb). In fact d(pr)/d(Xb) will
> always map out the bell-shaped curve over a sufficient range of Xb.
> So, all logit and probit models have an interaction by construction,
> even when we do not introduce interaction terms.
>
> These built-in interactions from nonlinear responses lie at the heart
> of Ai and Norton's (2003) protracted explorations of interactions.
>
> These nonlinearities do not exist in the natural metric of the model.
> If we think of the response of the probit model as being a one-standard
> deviation change in the latent response (index value if you prefer
> GLM), then we have no nonlinearities, and we can directly interpret our
> coefficients. The case is even more compelling for logistic models,
> where the parameter estimates can be expressed as odds ratios that do
> not change as the levels of other variables change. Maarten Buis has
> championed this approach many times on the list, e.g.,
>
> http://www.stata.com/statalist/archive/2010-08/msg00968.html
>
> with reference to an associated Stata Journal article,
>
> http://www.maartenbuis.nl/publications/interactions.html
>
> Even so, changes in probability, or another nonlinear response, can
> often be useful in characterizing a model. And, you say, you still
> want an "interaction effect" on a nonlinear response. -margins- can
> directly compute these effects for any number of interactions of
> indicator or factor-variable covariates and for interactions of those
> with a continuous covariates. It cannot directly compute the effects
> of continuous-continuous interactions. Given what we have seen above,
> I contend that continuous-continuous interactions are the least useful
> interactions and those most likely to obscure important relationships.
>
> That said, Arne has shown how to creatively use -margins- to
> numerically compute the pieces of a continuous-continuous interaction,
> and then assemble the interaction yourself. I have a simplification of
> Arne's example for those wanting the effects computed at the means of
> the covariates.
>
> Set up the dataset, and run the probit model
>
> . sysuse auto, clear
> . replace weight=weight/1000
> . replace length=length/10
> . probit foreign weight length c.weight#c.length, nolog
>
> Rather than,
>
> . margins, dydx(*) atmeans at(weight=3.019559)
> . matrix b = r(b)
> . scalar meff_turn_1 = b[1,2]
>
> . margins, dydx(*) atmeans at(weight=3.019459)
> . matrix b = r(b)
> . scalar meff_turn_0 = b[1,2]
>
> . di (meff_turn_1 - meff_turn_0) / 0.0001
>
> you could use the -margins- contrast operator to take the
> difference between the marginal effect for the two values of
> weight,
>
> . margins, dydx(length) atmeans at(weight=3.019459) at(weight=3.019559)
> contrast(atcontrast(r._at)) post
> . margins, coeflegend
> . nlcom _b[r2vs1._at] / .0001
>
> One tricky part of the -margins- command is -at(weight=3.019459)
> at(weight=3.019559)-. We are simply evaluating the derivative
> -dydx(length)- at the mean of weight and at the mean of weight plus a
> small epsilon, so we can numerically take the cross derivative w.r.t.
> weight. A second tricky part is -contrast(atcontrast(r._at))-. We are
> asking for the contrast (difference) in the two at() values we
> specified for weight. We use the -post- option of -margins- to post
> the results as estimation results, then use -nlcom- to divide by our
> epsilon.
>
> I typed -margins, coeflegend- only because we would never know that we
> need to refer to the estimated difference as _b[r2vs1._at] without that
> legend. The simplified technique has the added benefit of providing
> confidence intervals on the estimate.
>
> Given that we know the exact form of the probability, we would still
> get the most accurate results using the method described in the FAQ
> that led to the original question in this thread,
>
> http://www.stata.com/support/faqs/statistics/marginal-effects-after-interactions/
>
> Although I agree with Arne that the numerical example using -margins-
> is mostly pedagogical, I admit that in the dark ages, before -margins-
> existed, I regularly performed such computations. With a little
> sensitivity testing of the epsilon used to compute the derivative
> (.0001 above), these can be accurate estimates.
>
> We can use Arne's example of a continuous-factor interaction to show
> how to estimate the "interaction effect" using only -margins-. I am
> again showing Arne's full example, because it makes clear what
> -margins- is computing.
>
> Set up the dataset, and run the probit model
>
> . sysuse auto, clear
> . set seed 12345
> . generate dum=uniform()>0.5
> . probit foreign turn i.dum i.dum#c.turn, nolog
>
> Rather than,
>
> . margins, dydx(*) atmeans at(dum=1)
> . matrix b = r(b)
> . scalar meff_turn_dum1 = b[1,1]
>
> . margins, dydx(*) atmeans at(dum=0)
> . matrix b = r(b)
> . scalar meff_turn_dum0 = b[1,1]
>
> . di meff_turn_dum1 - meff_turn_dum0
>
> use -margins-' contrast operator to compute the interaction.
>
> . margins r.dum, dydx(turn) atmeans
>
> With this approach, we can remove the -atmeans- option and estimate the
> average "interaction effect", rather than the "interaction effect" at
> the means,
>
> . margins r.dum, dydx(turn)
>
> These "interaction effects" do not bother me in the same way a
> continous-continuous "interaction effect" does. Why? Because there
> are only two values for the variable dum. That means we have
> completely explored the interaction space of the two variables dum and
> turn. It does not mean that we have explored how the marginal effect
> of turn varies with its own values or those of other covariates in the
> model, and that is why I would still look at the the MERs and APRs.
>
> Factor-factor "interaction effects" can also be estimated using the
> contrast operators.
>
> For model,
>
> . logit A##B ...
>
> type,
>
> . margins r.A#r.B
>
>
> to estimate the average "interaction effect"
>
> or, to estimate the "interaction effect" at the means, type
>
> . margins r.A#r.B, atmeans
>
>
> This naturally extends to multiway interactions,
>
> . logit A##B##C ...
>
> . margins r.A#r.B#r.C
>
> Again, these "interaction effects" do not bother me in the way
> continuous-continuous interactions do. With factor variables, the
> interactions are exploring the complete space of results. Even so, I
> still like to look at the margins (estimated means),
>
> . margins A#B#C
>
> It has been my experience that the contrast operators and other
> contrast features added to -margins- in Stata 12 have gone largely
> unnoticed. I am glad Arne's examples provided the a platform to
> demonstrate what they do.
>
>
>
> -- Vince
> [email protected]
>
>
>
> Ai, C. R. and E. C. Norton. 2003. Interaction terms in logit and probit
> models. Economics Letters 80(1): 123-129.
>
> Buis, Maarten L. Stata tip 87: Interpretation of interactions in nonlinear
> models. The Stata Journal (2010) Vol. 10 No. 2, pp. 305-308.
>
> Williams, R. Using the margins command to estimate and interpret adjusted
> predictions and marginal effects. The Stata Journal (2012) Vol. 12 No.
> 2, pp 308-331.
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/