Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Margins in STATA 12 - how to use at () for dichotomous variables?
From
Joerg Luedicke <[email protected]>
To
[email protected]
Subject
Re: st: Margins in STATA 12 - how to use at () for dichotomous variables?
Date
Wed, 19 Dec 2012 20:36:10 -0500
On Wed, Dec 19, 2012 at 12:08 PM, Neeraj Iyer <[email protected]> wrote:
> 1) When gender is held at value=1, the marginal effect of x1 is for
> either all men or all women.
Yes, depending on what is coded as 1.
> 2) When gender is held at its mean, the marginal effect of x1 is
> controlling for gender based on the proportion of those coded as '1'.
I would not use the term "controlling" here. What it simply means is
that you plug in the mean of the gender variable into the regression
equation when doing the predictions. Say you have 50% males and 50%
females, then - strictly speaking - you would get marginal effects for
people who are half male and half female.
> 3) When gender is averaged over men and women, the marginal effect of
> x1 is the arithmetic mean of the marginal effect of men and that of
> women.
Not quite. It is essentially a weighted sum of the predictions for
males and those for females, where the predictions for males are
weighted with the proportion of males and the predictions for females
are weighted with the proportion of females. To see this, we can look
at an example. In order to simplify things, lets say we have two
binary predictors, say treatment and gender, and want to predict
counts for both treatment arms, averaged over males and females.
First we generate some data:
//Some toy data (I use some modified code from Joseph Hilbe
//(http://www.stata.com/statalist/archive/2011-06/msg00183.html)
clear
set obs `=1e4'
set seed 1234
gen treat = runiform()>.5
gen gender = runiform()>.5
//Count component
gen xb = 2 + 0.5*treat - 0.5*gender
gen a = .5
gen ia = 1/a
gen exb = exp(xb)
gen xg = rgamma(ia, a)
gen xbg = exb * xg
gen nby = rpoisson(xbg)
//Binary component
gen pi =1/(1+exp(-(0.5*treat + 0.5*gender + 0.2)))
gen bernoulli = runiform()>pi
gen zy = bernoulli*nby
rename zy y
* --> fit the model:
//Model fit
zinb y i.treat gender, inflate(i.treat gender)
* --> ...and compute marginal counts for both treatment arms for
gender=0, gender=1:
//Manual calculation of marginal counts
*marginal counts for control and treatment, for gender=0
margins i.treat, at(gender=0)
gen cou_cont_g0=exp(2.0276)
gen cou_trt_g0=exp(2.0276 + .4674933)
gen p_cont_g0 = 1 - ( exp(.2195245) / (1+exp(.2195245) ) )
gen p_trt_g0 = 1 - ( exp(.2195245 + .4923645) / (1+exp(.2195245 + .4923645) ) )
gen cont_g0 = cou_cont_g0 * p_cont_g0
gen trt_g0 = cou_trt_g0 * p_trt_g0
sum cont_g0 trt_g0
*marginal counts for control and treatment, for gender=1
margins i.treat, at(gender=1)
gen cou_cont_g1=exp(2.0276 - .4888359)
gen cou_trt_g1=exp(2.0276 + .4674933 - .4888359)
gen p_cont_g1 = 1 - ( exp(.2195245 + .4601969) / (1+exp(.2195245 + .4601969) ) )
gen p_trt_g1 = 1 - ( exp(.2195245 + .4923645 + .4601969) /
(1+exp(.2195245 + .4923645 + .4601969) ) )
gen cont_g1 = cou_cont_g1 * p_cont_g1
gen trt_g1 = cou_trt_g1 * p_trt_g1
sum cont_g1 trt_g1
* --> Finally, we compute averaged marginal counts as a weighted sum
of the predictions for both gender categories:
*marginal counts for control and treatment, averaged over gender
margins i.treat
sum gender
gen cont = (cont_g0 * (1-r(mean)) + cont_g1 * r(mean))
gen trt = (trt_g0 * (1-r(mean)) + trt_g1 * r(mean))
sum cont trt
Note that the same principle applies to average marginal effects (as
opposed to marginal counts). In the above example with a binary
variable, the marginal effects are simply the differences between
marginal counts for the two treatment arms. For example,
margins, dydx(treat)
would correspond to:
margins, dydx(treat) at(gender=0)
margins, dydx(treat) at(gender=1)
sum gender
di .607908 * (1-r(mean)) + .191445 * r(mean)
However, I doubt that average marginal effects are a good choice for
continuous predictors in non-linear models, especially so in a model
like the ZINB. Consider the following example with a zero-inflated
outcome and one continuous predictor:
clear
set obs `=1e4'
set seed 1234
gen x = rnormal()
//Count component
gen xb = 2 + 0.2*x
gen a = .5
gen ia = 1/a
gen exb = exp(xb)
gen xg = rgamma(ia, a)
gen xbg = exb * xg
gen nby = rpoisson(xbg)
//Binary component
gen pi =1/(1+exp(-(0.5*x + 0.2)))
gen bernoulli = runiform()>pi
gen zy = bernoulli*nby
rename zy y
//Model fit
zinb y x, inflate(x)
The average marginal effect of x would be:
margins, dydx(x)
which indicates that per unit increase in x, we expect y to decrease
by roughly a fifth of a count, i.e. we would assume a linear effect.
However, if we visualize the effect of x on y by plotting fitted
values of y
predict yhat
as a function of x:
line yhat x, sort
we can see that the effect of x on y is all but linear in this example
and that the average marginal effect is a poor representation of this
effect. Average marginal effects of this kind should probably be
avoided altogether in non-linear models. If you would be interested in
the relation of x on y at certain values of age, you could do several
predictions and change the value for age in each of the predictions.
The same strategy would work if you had an age*x interaction effect.
Joerg
On Wed, Dec 19, 2012 at 12:08 PM, Neeraj Iyer <[email protected]> wrote:
> Thank you Mr. Luedicke. I ran the codes that you sent. Here is my
> understanding of the output:
>
> 1) When gender is held at value=1, the marginal effect of x1 is for
> either all men or all women.
> 2) When gender is held at its mean, the marginal effect of x1 is
> controlling for gender based on the proportion of those coded as '1'.
> 3) When gender is averaged over men and women, the marginal effect of
> x1 is the arithmetic mean of the marginal effect of men and that of
> women.
>
> I reckon from a substantive standpoint, 1) or 3) are more informative
> than 2). If I were interested in the marginal effect of x1 at 10 year
> intervals of age, would it then be prudent to use 3)?
>
> Thank you,
>
> Regards,
> Neeraj
>
> On Fri, Dec 14, 2012 at 11:05 AM, Joerg Luedicke
> <[email protected]> wrote:
>>
>> clear
>> set obs `=1e4'
>> set seed 1234
>> gen x1 = rnormal()
>> gen gender = runiform()>.5
>>
>> //Count component
>> gen xb = 2 + 0.5*x1 - 0.5*gender
>> gen a = .5
>> gen ia = 1/a
>> gen exb = exp(xb)
>> gen xg = rgamma(ia, a)
>> gen xbg = exb * xg
>> gen nby = rpoisson(xbg)
>>
>> //Binary component
>> gen pi =1/(1+exp(-(0.5*x1 + 0.5*gender + 0.2)))
>> gen bernoulli = runiform()>pi
>> gen zy = bernoulli*nby
>> rename zy y
>>
>> //Model fit
>> zinb y x1 gender, inflate(x1 gender)
>>
>> //OP's -margins- statements
>> margins, dydx(x1) at(gender=1)
>> margins, dydx(x1) at((mean) gender)
>>
>> //Now consider the following which is similar to your second statement:
>> sum gender
>> margins, dydx(x1) at(gender=.4983)
>>
>> //And compare it to the following, where the marginal effects are
>> averaged over men and women:
>> margins, dydx(x1)
>
>
>
>
> --
> Neeraj
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/