In Stata 11, the margins command replaced mfx.
| Title | Obtaining marginal effects without standard errors | |
| Author | May Boggess, StataCorp |
Not every predict option for every estimation command is suitable for calculating the standard error of the marginal effects, so mfx checks if the predict option specified is suitable.
A marginal effect is the partial derivative of the prediction function f with respect to each covariate x. The mfx command calculates each of these derivatives numerically. This means that it uses the following approximation for each x_i:
df f(x_i+h) − f(x_i)
---- = --------------------
dx_i h
for an appropriate small change in x_i, h, holding all of the other covariates and coefficients constant. mfx evaluates this derivative at the mean of each of the covariates or, if you have used the at() option, at the values specified there.
The standard error of the marginal effect is computed by the delta method:
dM_i ' dM_i
Var(M_i) = -------- Var(B) ------
db db
where M_i is the marginal effect of the ith independent variable x_i, and the vector dM_i/db has for its jth component, the partial derivative of M_i with respect to the the coefficient of the jth independent variable, b_j. This is because the marginal effect, evaluated at a point, is a function of the coefficients b_j only.
To calculate dM_i/db_j, mfx uses the usual approximation:
dM_i f(x_i, b_j+hb) − f(x_i, b_j)
------ = ---------------------------------
db_j hb
where hb is a small change in b_j. This is a partial derivative, so this is done holding all other coefficients constant (at the value estimated by the estimation command), and all covariates are held constant at the value specified in the mfx command.
Problems arise in computing dM_i/db_j when the prediction function f depends on the coefficients in a less than straightforward manner. Let’s look at an example:
. use http://www.stata-press.com/data/r10/hsng2, clear
(1980 Census housing data)
. ivregress 2sls rent pcturban (hsngval = faminc reg2-reg4)
Instrumental variables (2SLS) regression Number of obs = 50
Wald chi2(2) = 90.76
Prob > chi2 = 0.0000
R-squared = 0.5989
Root MSE = 22.166
------------------------------------------------------------------------------
rent | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
hsngval | .0022398 .0003284 6.82 0.000 .0015961 .0028836
pcturban | .081516 .2987652 0.27 0.785 -.504053 .667085
_cons | 120.7065 15.22839 7.93 0.000 90.85942 150.5536
------------------------------------------------------------------------------
Instrumented: hsngval
Instruments: pcturban faminc reg2 reg3 reg4
. mfx, predict(pr(200,300)) diagnostics(vce)
Check prediction function does not depend on dependent variables,
covariance matrix, or stored scalars.
dfdx:
.00001126 .00040971
dfdx, after resetting dependent variables, covariance matrix, and stored scalars:
. .
Relative difference = .
warning: predict() expression pr(200,300) unsuitable for standard-error calculation;
option nose imposed
Marginal effects after ivregress
y = Pr(200<rent<300) (predict, pr(200,300))
= .9399585
-------------------------------------------------------------------------------
variable | dy/dx X
---------------------------------+---------------------------------------------
hsngval | .0000113 48484
pcturban | .0004097 66.9491
-------------------------------------------------------------------------------
The diagnostics(vce) option shows us how mfx came to the conclusion that standard errors are not appropriate.
mfx checks this by setting the covariance matrix to the identity matrix, setting all the dependent variables to zero, and blanking out various scalars stored in the estimates. Then it recalculates the marginal effect. If it gets the same results it got the first time, it concludes that the prediction function did not depend on any of those quantities. But, if the results changed, then mfx concludes there is a problem.
In our example, the results certainly changed. What happened? Well, the function pr(200,300) depends on e(rmse), which is stored as a scalar, and thus has been blanked out. That’s why we got an empty answer the second time around. And it really is a problem for the prediction function to depend on e(rmse), because e(rmse) depends on the coefficients and when mfx is calculating the derivative of f with respect to a coefficient, it is assuming that f depends on the coefficients only through the coefficient matrix e(b).
What if the two matrices were the same but contained empty values, thus making the relative difference nonzero? It is probably a good idea to figure out why those marginal effects were coming up empty. Often it is because you are trying to evaluate the marginal effect at a point where the values of the prediction function are not very reasonable. So I would use the at() option (as well as nose to save some time) and calculate the marginal effects at points nearby where you were trying to calculate it. Sometimes a small change in the point will make a big difference. As a last resort, you can use the varlist option on mfx so the marginal effects that were empty will not be calculated, and it will pass the test.
What if the difference between the two was very, very small, say 10−10? This is not small enough to pass the test, but surely this difference is minor. That may well be true. I would try the same approach as I did in the previous paragraph, using the at() option to change the point, and see if that makes a difference. Here is an example like that:
. use http://www.stata-press.com/data/r10/abdata, clear
. set matsize 800
. xtabond n l(0/1).w l(0/2).(k ys) yr1980-yr1984 year, lags(2) noconstant
Arellano-Bond dynamic panel-data estimation Number of obs = 611
Group variable: id Number of groups = 140
Time variable: year
Obs per group: min = 4
avg = 4.364286
max = 6
Number of instruments = 40 Wald chi2(15) = 1627.13
Prob > chi2 = 0.0000
One-step results
------------------------------------------------------------------------------
n | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
n |
L1. | .7080866 .1455545 4.86 0.000 .4228051 .9933681
L2. | -.0886343 .0448479 -1.98 0.048 -.1765346 -.000734
w |
--. | -.605526 .0661129 -9.16 0.000 -.735105 -.4759471
L1. | .4096717 .1081258 3.79 0.000 .1977491 .6215943
k |
--. | .3556407 .0373536 9.52 0.000 .2824289 .4288525
L1. | -.0599314 .0565918 -1.06 0.290 -.1708493 .0509865
L2. | -.0211709 .0417927 -0.51 0.612 -.1030831 .0607412
ys |
--. | .6264699 .1348009 4.65 0.000 .3622651 .8906748
L1. | -.7231751 .1844696 -3.92 0.000 -1.084729 -.3616214
L2. | .1179079 .1440154 0.82 0.413 -.1643572 .400173
yr1980 | .0113066 .0140625 0.80 0.421 -.0162554 .0388686
yr1981 | -.0212183 .0206559 -1.03 0.304 -.0617031 .0192665
yr1982 | -.034952 .022122 -1.58 0.114 -.0783103 .0084063
yr1983 | -.0287094 .0251536 -1.14 0.254 -.0780096 .0205909
yr1984 | -.014862 .0284594 -0.52 0.602 -.0706414 .0409174
------------------------------------------------------------------------------
Instruments for differenced equation
GMM-type: L(2/.).n
Standard: D.w LD.w D.k LD.k L2D.k D.ys LD.ys L2D.ys D.yr1980 D.yr1981 D.yr1982
D.yr1983 D.yr1984
. mfx, at(mean L.n=-0.06) diag(vce)
Check prediction function does not depend on dependent variables,
covariance matrix, or stored scalars.
dfdx:
.70808656 -.08863433 -.60552603 .40967169 .35564067 -.0599314 -.02117091
.62646995 -.7231751 .11790789 .01130656 -.02121832 -.03495199 -.02870935
-.01486203
dfdx, after resetting dependent variables, covariance matrix, and stored scalars:
.70808656 -.08863433 -.60552603 .40967169 .35564067 -.0599314 -.02117091
.62646995 -.7231751 .11790789 .01130656 -.02121832 -.03495199 -.02870935
-.01486203
Relative difference = 0
Marginal effects after xtabond
y = Linear prediction (predict)
= -.84471245
------------------------------------------------------------------------------
variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X
---------+--------------------------------------------------------------------
L.n | .7080866 .14555 4.86 0.000 .422805 .993368 -.06
L2.n | -.0886343 .04485 -1.98 0.048 -.176535 -.000734 1.09584
w | -.605526 .06611 -9.16 0.000 -.735105 -.475947 3.14957
L.w | .4096717 .10813 3.79 0.000 .197749 .621594 3.12676
k | .3556407 .03735 9.52 0.000 .282429 .428852 -.502119
L.k | -.0599314 .05659 -1.06 0.290 -.170849 .050987 -.429181
L2.k | -.0211709 .04179 -0.51 0.612 -.103083 .060741 -.391757
ys | .6264699 .1348 4.65 0.000 .362265 .890675 4.59385
L.ys | -.7231751 .18447 -3.92 0.000 -1.08473 -.361621 4.62901
L2.ys | .1179079 .14402 0.82 0.413 -.164357 .400173 4.66607
yr1980*| .0113066 .01406 0.80 0.421 -.016255 .038869 .225859
yr1981*| -.0212183 .02066 -1.03 0.304 -.061703 .019266 .229133
yr1982*| -.034952 .02212 -1.58 0.114 -.07831 .008406 .229133
yr1983*| -.0287094 .02515 -1.14 0.254 -.07801 .020591 .12766
yr1984*| -.014862 .02846 -0.52 0.602 -.070641 .040917 .057283
------------------------------------------------------------------------------
(*) dy/dx is for discrete change of dummy variable from 0 to 1
If you want to force mfx to compute the standard error of the marginal effect, despite failing the above test, you can do so by using the force option. If you can’t find a better point, but the difference was very small for each point you tried, and you convinced yourself by examining the formula for the prediction function that it shouldn't depend on anything but the covariate values and the coefficient matrix e(b), then you may be confident enough to use force.
But remember, if diag(vce) shows a large relative difference (say, bigger than 10−2 for example) the standard errors given by using force will probably be wrong because mfx cannot take into account dependency on coefficients that is not through e(b).