Title | Compute standard errors with margins | |
Author | Jeff Pitblado, StataCorp |
Here is an example using logit:
. webuse margex (Artificial data for margins) . logit outcome i.treatment distance, nofvlabel Iteration 0: log likelihood = -1366.0718 Iteration 1: log likelihood = -1257.5623 Iteration 2: log likelihood = -1244.2136 Iteration 3: log likelihood = -1242.8796 Iteration 4: log likelihood = -1242.8245 Iteration 5: log likelihood = -1242.8245 Logistic regression Number of obs = 3,000 LR chi2(2) = 246.49 Prob > chi2 = 0.0000 Log likelihood = -1242.8245 Pseudo R2 = 0.0902
outcome | Coefficient Std. err. z P>|z| [95% conf. interval] | |
1.treatment | 1.42776 .113082 12.63 0.000 1.206124 1.649397 | |
distance | -.0047747 .0011051 -4.32 0.000 -.0069406 -.0026088 | |
_cons | -2.337762 .0962406 -24.29 0.000 -2.52639 -2.149134 | |
I will show how margins computes standard errors (SEs) of average marginal effects (AMEs). The remaining discussion has two parts. The first part describes how to compute AMEs and their SE estimates for factor variables; the second part concerns continuous variables.
Because the AME of a two-level factor variable is just the difference between the two predictive margins, we start by computing the SEs for predictive margins. Here we use margins to compute the predictive margins for the two levels of treatment:
. margins treatment, nofvlabel Predictive margins Number of obs = 3,000 Model VCE: OIM Expression: Pr(outcome), predict()
Delta-method | ||
Margin std. err. z P>|z| [95% conf. interval] | ||
treatment | ||
0 | .0791146 .0069456 11.39 0.000 .0655016 .0927277 | |
1 | .2600204 .0111772 23.26 0.000 .2381135 .2819272 | |
We make copies of two matrices from the margins's stored results to compare later. r(Jacobian) is the Jacobian matrix, which will be explained later. r(V) is the estimated variance matrix that corresponds with the reported predictive margins. The square root of the diagonal elements are reported in the above column labeled “Delta-method Std. Err.”.
. matrix rJ = r(Jacobian) . matrix rV = r(V) . display sqrt(rV[1,1]) .00694557 . display sqrt(rV[2,2]) .01117718
The delta method allows us to obtain the appropriate SEs of any smooth function of the fitted model parameters. It basically involves applying a Jacobian matrix to the estimated variance matrix of the fitted model parameters. The Jacobian is a matrix of partial derivatives of the statistics of interest with respect to each of the fitted model parameters.
Before we start taking derivatives, let's see how the predictive margins are essentially computed. This will provide us with the formulas from which we will work out the derivatives that go into the Jacobian matrix. In the following, p0 is the variable used to re-create the predictive margin for 0.treatment, and p1 corresponds to 1.treatment. The means of these variables are the predictive margins.
. generate p0 = invlogit(_b[0b.treatment] + _b[distance]*distance + _b[_cons]) . generate p1 = invlogit(_b[1.treatment] + _b[distance]*distance + _b[_cons]) . sum p0 p1
Variable | Obs Mean Std. dev. Min Max | |
p0 | 3,000 .0791146 .0212555 .0026954 .0879477 | |
p1 | 3,000 .2600204 .0688961 .011143 .2867554 |
Now we will make the calculations necessary to reproduce the Jacobian matrix, which we will then use to reproduce the estimated variance matrix. The rows identify what we are taking the partial derivative of; the columns identify what we are taking the partial derivative with respect to. Thus the rows correspond with the predictive margins, and the columns correspond with the fitted model parameters in the logistic regression.
. generate dp0dxb = p0*(1-p0) . generate dp1dxb = p1*(1-p1) . generate zero = 0 . generate one = 1 . matrix vecaccum J0 = dp0dxb zero zero distance . matrix vecaccum J1 = dp1dxb zero one distance . matrix Jac = J0/e(N) \ J1/e(N) . matrix V = Jac*e(V)*Jac' . matrix list Jac Jac[2,4] zero zero distance _cons dp0dxb 0 0 .74390659 .07240388 dp1dxb 0 .18766468 2.1907626 .18766468 . matrix list rJ rJ[2,4] outcome: outcome: outcome: outcome: 0b. 1. treatment treatment distance _cons 0.treatment 0 0 .74390659 .07240388 1.treatment 0 .18766468 2.1907626 .18766468 . matrix list V symmetric V[2,2] dp0dxb dp1dxb dp0dxb .00004824 dp1dxb -1.181e-07 .00012493 . matrix list rV symmetric rV[2,2] 0. 1. treatment treatment 0.treatment .00004824 1.treatment -1.181e-07 .00012493
Now let's compute the marginal effect of treatment. margins keeps a placeholder for the base outcome of treatment; the Jacobian and variance elements are merely set to 0.
. margins, dydx(treatment) nofvlabel Average marginal effects Number of obs = 3,000 Model VCE: OIM Expression: Pr(outcome), predict() dy/dx wrt: 1.treatment
Delta-method | ||
dy/dx std. err. z P>|z| [95% conf. interval] | ||
1.treatment | .1809057 .0131684 13.74 0.000 .1550961 .2067153 | |
The marginal effect of treatment is just the difference between the predictive margins. Here we take advantage of the fact that the difference of the means is the mean of the differences.
. generate pdiff = p1 - p0 . sum pdiff
Variable | Obs Mean Std. dev. Min Max | |
pdiff | 3,000 .1809057 .0476499 .0084476 .1988077 |
The Jacobian is similarly composed from the previous calculations. We compose it ignoring the base level of treatment. Notice that the elements of Jdiff match those of the second row of r(Jacobian) above.
. matrix Jdiff = (-1, 1)*Jac . matrix list Jdiff Jdiff[1,4] zero zero distance _cons r1 0 .18766468 1.446856 .11526081 . matrix Vdiff = Jdiff*e(V)*Jdiff' . matrix list Vdiff symmetric Vdiff[1,1] r1 r1 .00017341
The marginal effect of a continuous variable is essentially computed the same way, but there are more derivatives involved. Here we have margins compute the AME of distance and save copies of r(Jacobian) and r(V).
. margins, dydx(distance) Average marginal effects Number of obs = 3,000 Model VCE: OIM Expression: Pr(outcome), predict() dy/dx wrt: distance
Delta-method | ||
dy/dx std. err. z P>|z| [95% conf. interval] | ||
distance | -.0006228 .0001444 -4.31 0.000 -.0009058 -.0003399 | |
The AME of distance is the average of the derivative of the prediction with respect to distance.
. predict p, pr . generate dpdxb = p*(1-p) . generate dpdx = dpdxb*_b[distance] . sum dpdx
Variable | Obs Mean Std. dev. Min Max | |
dpdx | 3,000 -.0006228 .0003231 -.0009766 -.0000128 |
For the Jacobian, we must compute the partials of dpdx with respect to the model parameters. The partial with respect to the coefficient for distance has an extra piece.
. generate d2pdxb2 = p*(1-p)*(1-p) - p*p*(1-p) . matrix vecaccum Jac = d2pdxb2 0b.treatment 1.treatment distance . matrix Jac = Jac*_b[distance]/e(N) . sum dpdxb
Variable | Obs Mean Std. dev. Min Max | |
dpdxb | 3,000 .1304451 .0676672 .0026901 .2045267 |
In summary, I have shown how to compute discrete and continuous marginal effects along with their corresponding SE estimates using the delta method. This is essentially what margins does in all cases, except that it often uses numerical derivatives.