Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Obtaining marginal effects and their standard errors after

From	Arne Risa Hole <[email protected]>
To	[email protected]
Subject	Re: st: Obtaining marginal effects and their standard errors after
Date	Wed, 9 Jan 2013 10:43:35 +0000
Dear Vince,

Thanks for posting this. I found it very illuminating, in particular
the clever uses of the contrast features of -margins-.

Best wishes,
Arne

On 8 January 2013 23:50, Vince Wiggins, StataCorp <[email protected]> wrote:
> Arne Risa Hole <[email protected]> and Richard Williams
> <[email protected]> have had an illuminating exchange about
> the computation and meaning of interaction effects on the probability
> of a positive outcome in models with a binary response.  The discussion
> applies to any response that is not a linear combination of the
> coefficients, but let's stick with probabilities.  I have a few related
> thoughts and also want to show off some of -margins- lesser known
> features using Arne's clever examples.
>
> Richard wonders "why margins does not provide marginal effects for
> interactions".  We have nothing against so called "interaction
> effects", though as Richard notes they are a funny kind of effect.  You
> cannot change an interaction directly, you can only change its
> constituent pieces.  (Ergo, why "interaction effects" are so called.)
> You can, however, interpret an interaction, and as Arne notes, that
> interpretation is just the change in the slope of one variable as the
> other variable itself changes,
>
>                         d(y)
>         interaction = ----------
>                       d(x1)d(x2)
>
> What I will dub "own interactions", an interaction of a variable with
> itself, have a long history in physics.  The slope of a time and
> distance being velocity,
>
>                    d(distance)
>         velocity = -----------
>                      d(time)
>
> and, the interaction with time itself being acceleration,
>
>                         d(distance)     d(distance)
>         acceleration = -------------- = -----------
>                        d(time)d(time)    d^2(time)
>
> An "own interaction" does not have the problem that we are required to
> think of changing the interaction itself.  There is only one variable
> to change.  Moreover, we rarely have such nice descriptions of our
> interactions, own or otherwise.  When we regress mileage on weight and
> weight squared, we are simply admitting that a linear relationship
> doesn't match the data, and we need some flexibility in the
> relationship between mileage and weight.  We do not think that weight
> squared has its own interpretation.
>
> In such cases, I am a fan of visualizing the relationships over a range
> of meaningful values, rather than trying to create a single number that
> summarizes the "interaction effect".  We know that the effects differ
> for different levels of the interacted variables and for different
> levels of other variables.  Best to admit this and evaluate the
> response at different points.  As Richard points out, "the problem with
> any `average' number (AME or MEM) is that it disguises a great deal of
> individual level variability ...  That is why I like MERs (marginal
> effects at representative values), or else APRs (a term I made up)
> which stands for Adjusted Predictions at Representative Values."  Me
> too.
>
> Richard's slides on using -margins- in this context should be required
> reading,
>
>     http://www.nd.edu/~rwilliam/stats/Margins01.pdf
>
> as should his Stata Journal article,
>
>     http://www.statajournal.com/article.html?article=st0260
>
> If you are trying to test whether an interaction term in your model is
> statistically significant, do that in the metric in which you estimated
> the model.  That is to say, look at the test statistics on the
> interaction term.
>
> One thing to keep in mind is that with a nonlinear response (e.g.,
> probabilities in a probit or logit model) you have in interaction
> effect between your covariates even when you do not have an interaction
> term in the model.  The probability is an S-shaped response in Xb, so,
> as any covariate changes, it pushes the the response of the other
> covariates into either one of the tails, where the response is
> attenuated, or toward the the center, where the response is
> strengthened.
>
> Try this example
>
>     . webuse margex
>     . probit outcome age distance
>     . margins, dydx(age) at(distance=(0(100)800))
>     . marginsplot
>
> We estimated a model with no interaction, yet when we graph the
> marginal effect of age over a range of distances, we find a strong
> downward trend in the change in probability for a change in age as
> distance increases.
>
> Even more fun, try this example,
>
>     . clear
>     . set seed 12345
>     . set obs 5000
>
>     . gen x = runiform() - .5
>     . gen z = runiform() - .5
>     . gen xb = x + 8*z
>     . gen y = 1 / (1 + exp(xb)) < uniform()
>
>     . logit y x z
>
>     . margins, dydx(x) at(z=(-.5(.1).5))
>     . marginsplot
>
> Again, we have no interaction term in the model, but plenty of
> "interaction effect" on the probability.  The marginal effect of x on
> probability traces out a nice bell-shaped curve as z increases.  The
> marginal effect of x on probability first rises as z rises, then peaks
> and falls as z continues to rise.  The "interaction" is pronounced, the
> marginal effect rising from near 0 to about .25, then falling back to
> 0.
>
> Despite this pronounced "interaction", if we were to compute the
> average "interaction effect", it would be 0 (at least asymptotically).
> It is 0 because the positive and negative interactions sum to 0 in this
> example.  This is directly analygous to the well-worn example of
> fitting a linear model to quadratic data and finding no relationship.
> That is why I do not like to talk about continuous-continuous
> "interaction effects" as a single value.  I would rather explore the
> MEMs or APRs.
>
> These graph are as we would expect.  Logit (and probit) probabilities
> look like,
>
>      pr = f(Xb)
>
> where f() is a monotonically increasing function of xb that asymptotes
> to 0 as xb -> -infinity and asymptotes to 1 as xb -> +infinity.  That
> is to say it is an S-shaped function in Xb.
>
> If z is a covariate in the set of covariates X, then,
>
>      marginal effect of z = d(pr)/d(z) = d(pr)/d(Xb) * d(Xb)/d(z)
>
> So, every marginal effect also includes a contribution from all
> other covariates in the model (the X in Xb).  In fact d(pr)/d(Xb) will
> always map out the bell-shaped curve over a sufficient range of Xb.
> So, all logit and probit models have an interaction by construction,
> even when we do not introduce interaction terms.
>
> These built-in interactions from nonlinear responses lie at the heart
> of Ai and Norton's (2003) protracted explorations of interactions.
>
> These nonlinearities do not exist in the natural metric of the model.
> If we think of the response of the probit model as being a one-standard
> deviation change in the latent response (index value if you prefer
> GLM), then we have no nonlinearities, and we can directly interpret our
> coefficients.  The case is even more compelling for logistic models,
> where the parameter estimates can be expressed as odds ratios that do
> not change as the levels of other variables change.  Maarten Buis has
> championed this approach many times on the list, e.g.,
>
>     http://www.stata.com/statalist/archive/2010-08/msg00968.html
>
> with reference to an associated Stata Journal article,
>
>     http://www.maartenbuis.nl/publications/interactions.html
>
> Even so, changes in probability, or another nonlinear response, can
> often be useful in characterizing a model.  And, you say, you still
> want an "interaction effect" on a nonlinear response.  -margins- can
> directly compute these effects for any number of interactions of
> indicator or factor-variable covariates and for interactions of those
> with a continuous covariates.  It cannot directly compute the effects
> of continuous-continuous interactions.  Given what we have seen above,
> I contend that continuous-continuous interactions are the least useful
> interactions and those most likely to obscure important relationships.
>
> That said, Arne has shown how to creatively use -margins- to
> numerically compute the pieces of a continuous-continuous interaction,
> and then assemble the interaction yourself.  I have a simplification of
> Arne's example for those wanting the effects computed at the means of
> the covariates.
>
>     Set up the dataset, and run the probit model
>
>         . sysuse auto, clear
>         . replace weight=weight/1000
>         . replace length=length/10
>         . probit foreign weight length c.weight#c.length, nolog
>
>     Rather than,
>
>         . margins, dydx(*) atmeans at(weight=3.019559)
>         . matrix b = r(b)
>         . scalar meff_turn_1 = b[1,2]
>
>         . margins, dydx(*) atmeans at(weight=3.019459)
>         . matrix b = r(b)
>         . scalar meff_turn_0 = b[1,2]
>
>         . di (meff_turn_1 - meff_turn_0) / 0.0001
>
>      you could use the -margins- contrast operator to take the
>      difference between the marginal effect for the two values of
>      weight,
>
>         . margins, dydx(length) atmeans at(weight=3.019459) at(weight=3.019559)
>                    contrast(atcontrast(r._at)) post
>         . margins, coeflegend
>         . nlcom _b[r2vs1._at] / .0001
>
> One tricky part of the -margins- command is -at(weight=3.019459)
> at(weight=3.019559)-.  We are simply evaluating the derivative
> -dydx(length)- at the mean of weight and at the mean of weight plus a
> small epsilon, so we can numerically take the cross derivative w.r.t.
> weight.  A second tricky part is -contrast(atcontrast(r._at))-.  We are
> asking for the contrast (difference) in the two at() values we
> specified for weight.  We use the -post- option of -margins- to post
> the results as estimation results, then use -nlcom- to divide by our
> epsilon.
>
> I typed -margins, coeflegend- only because we would never know that we
> need to refer to the estimated difference as _b[r2vs1._at] without that
> legend.  The simplified technique has the added benefit of providing
> confidence intervals on the estimate.
>
> Given that we know the exact form of the probability, we would still
> get the most accurate results using the method described in the FAQ
> that led to the original question in this thread,
>
>     http://www.stata.com/support/faqs/statistics/marginal-effects-after-interactions/
>
> Although I agree with Arne that the numerical example using -margins-
> is mostly pedagogical, I admit that in the dark ages, before -margins-
> existed, I regularly performed such computations.  With a little
> sensitivity testing of the epsilon used to compute the derivative
> (.0001 above), these can be accurate estimates.
>
> We can use Arne's example of a continuous-factor interaction to show
> how to estimate the "interaction effect" using only -margins-.  I am
> again showing Arne's full example, because it makes clear what
> -margins- is computing.
>
>     Set up the dataset, and run the probit model
>
>         . sysuse auto, clear
>         . set seed 12345
>         . generate dum=uniform()>0.5
>         . probit foreign turn i.dum i.dum#c.turn, nolog
>
>     Rather than,
>
>         . margins, dydx(*) atmeans at(dum=1)
>         . matrix b = r(b)
>         . scalar meff_turn_dum1 = b[1,1]
>
>         . margins, dydx(*) atmeans at(dum=0)
>         . matrix b = r(b)
>         . scalar meff_turn_dum0 = b[1,1]
>
>         . di meff_turn_dum1 - meff_turn_dum0
>
>     use -margins-' contrast operator to compute the interaction.
>
>         . margins r.dum, dydx(turn) atmeans
>
> With this approach, we can remove the -atmeans- option and estimate the
> average "interaction effect", rather than the "interaction effect" at
> the means,
>
>         . margins r.dum, dydx(turn)
>
> These "interaction effects" do not bother me in the same way a
> continous-continuous "interaction effect" does.  Why?  Because there
> are only two values for the variable dum.  That means we have
> completely explored the interaction space of the two variables dum and
> turn.  It does not mean that we have explored how the marginal effect
> of turn varies with its own values or those of other covariates in the
> model, and that is why I would still look at the the MERs and APRs.
>
> Factor-factor "interaction effects" can also be estimated using the
> contrast operators.
>
>     For model,
>
>         . logit A##B ...
>
>     type,
>
>         . margins r.A#r.B
>
>
>     to estimate the average "interaction effect"
>
>     or, to estimate the "interaction effect" at the means, type
>
>         . margins r.A#r.B, atmeans
>
>
> This naturally extends to multiway interactions,
>
>         . logit A##B##C ...
>
>         . margins r.A#r.B#r.C
>
> Again, these "interaction effects" do not bother me in the way
> continuous-continuous interactions do.  With factor variables, the
> interactions are exploring the complete space of results.  Even so, I
> still like to look at the margins (estimated means),
>
>         . margins A#B#C
>
> It has been my experience that the contrast operators and other
> contrast features added to -margins- in Stata 12 have gone largely
> unnoticed. I am glad Arne's examples provided the a platform to
> demonstrate what they do.
>
>
>
> -- Vince
>    [email protected]
>
>
>
> Ai, C. R. and E. C. Norton. 2003. Interaction terms in logit and probit
>     models. Economics Letters 80(1): 123-129.
>
> Buis, Maarten L. Stata tip 87: Interpretation of interactions in nonlinear
>     models.  The Stata Journal (2010) Vol. 10 No. 2, pp. 305-308.
>
> Williams, R.  Using the margins command to estimate and interpret adjusted
>     predictions and marginal effects.  The Stata Journal (2012) Vol. 12 No.
>     2, pp 308-331.
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
References:
- Re: st: Obtaining marginal effects and their standard errors after
  - From: [email protected] (Vince Wiggins, StataCorp)
Prev by Date: st: svmat?
Next by Date: Re: st: svmat?
Previous by thread: Re: st: Obtaining marginal effects and their standard errors after
Next by thread: st: question: an example of xtmixed: testing differences between groups at specific times
Index(es):
- Date
- Thread