In the spotlight: Weak instruments and wacky confidence intervals
Stata's instrumental-variables regression command, ivregress, is widely used for fitting linear models with endogeneity. estat weakrobust, a new postestimation command in StataNow™ for ivregress, lets users perform tests and construct confidence intervals that are robust to weak instruments.
Weak instruments present a challenge for inference. The approximate normality of the instrumental-variables estimates exploited in conventional inference is inherited in part from the first-stage estimates—that is, from the relationship between the endogenous variables and the instruments.
When this relationship is weak, however, the instrumental-variables estimates depend nonlinearly on the first-stage estimates. Thus, the normality of the latter does not translate to the normality of the former. Standard t tests and associated confidence intervals become misleading (see Andrews, Stock, and Sun [2019]).
To get valid inference, you need to do something different. In this spotlight, I'll show you how to get robust tests and confidence intervals when you have weak instruments. I'll also show you an example of a case where these confidence intervals get weird.
Robust tests
One well-established way to get around the weak-instruments inference problem is the test of Anderson and Rubin (1949). The Anderson–Rubin test statistic has a distribution that does not depend on the first-stage estimates, making it robust to arbitrarily weak instruments.
The related conditional likelihood-ratio (CLR) test of Moreira (2003), which is appropriate when the model is overidentified, has similar properties: conditional on a known statistic, the distribution of the test statistic does not depend on the first-stage estimates.
Both tests can be inverted to produce confidence intervals. We can request Anderson–Rubin and CLR confidence intervals using estat weakrobust, ci after ivregress. For example, we can type
. webuse laborsup . ivregress 2sls fem_inc fem_educ kids (other_inc = male_educ) Instrumental-variables 2SLS regression Number of obs = 500 Wald chi2(3) = 105.23 Prob > chi2 = 0.0000 R-squared = 0.2814 Root MSE = 10.759
fem_inc | Coefficient Std. err. z P>|z| [95% conf. interval] | |
other_inc | -.374891 .064153 -5.84 0.000 -.5006286 -.2491535 | |
fem_educ | 1.274646 .1831334 6.96 0.000 .9157108 1.633581 | |
kids | -1.717837 .3564194 -4.82 0.000 -2.416406 -1.019268 | |
_cons | 24.76533 3.714901 6.67 0.000 17.48425 32.0464 | |
Anderson–Rubin | ||
Coefficient [95% conf. interval] | ||
other_inc | -.374891 -.5065501 -.2500736 | |
When instruments are strong, as in this case, robust tests are valid but conservative, typically returning confidence intervals that are slightly wider than conventional intervals.
When instruments are weak, however, robust tests can deliver results that are noticeably different from conventional tests—even a little wacky. (This is not always the case, however. Example 6 in [R] ivregress postestimation shows a case where weak instruments lead to an ordinary but wider confidence interval.)
Nonstandard intervals
Unlike conventional confidence intervals, confidence intervals produced by inverting Anderson–Rubin and CLR tests are not guaranteed to be finite intervals. Confidence intervals may cover the whole real line, or they may take the form of a union of several intervals. A confidence interval can even be empty if your model is overidentified and you use an Anderson–Rubin confidence interval (but if your model is overidentified, we recommend using a CLR confidence interval).
We can see an example of a CLR confidence interval that takes an irregular form by modeling gas mileage in the trusty Stata 1978 automobile dataset. We want to include the price of a car in the regression as an explanatory variable, but we believe it is endogenous, so we include indicators for the repair record of the car as instruments.
. sysuse auto (1978 automobile data) . ivregress 2sls mpg weight length foreign displacement gear_ratio (price = i.rep78), vce(robust) Instrumental-variables 2SLS regression Number of obs = 69 Wald chi2(6) = 14.97 Prob > chi2 = 0.0205 R-squared = . Root MSE = 11.028
mpg | Coefficient Std. err. z P>|z| [95% conf. interval] | |
price | .0055594 .0055978 0.99 0.321 -.0054122 .0165309 | |
weight | -.0291645 .026713 -1.09 0.275 -.0815211 .023192 | |
length | .3741789 .5311569 0.70 0.481 -.6668696 1.415227 | |
foreign | -24.52327 22.40424 -1.09 0.274 -68.43477 19.38823 | |
displacement | -.0410225 .0723021 -0.57 0.570 -.1827321 .1006871 | |
gear_ratio | 6.350259 6.084446 1.04 0.297 -5.575036 18.27555 | |
_cons | 1.635244 49.37558 0.03 0.974 -95.13912 98.40961 | |
We see that conventional inference does not provide evidence that the coefficient on price is different from 0. We suspect, however, that our instruments are weak and perform a test that is robust to weak instruments:
. estat weakrobust Test robust to weak instruments Model VCE: Robust ( 1) price = 0 Cond. likelihood-ratio (CLR) test = 8.29 Prob > CLR = 0.0232 Notes: CLR test reported by default because model is overidentified. p-value computed by simulation (25,000 replications).
Surprisingly, once weak instruments have been accounted for, we find statistical evidence that price is relevant in this regression: a p-value of 0.0232. Why is this the case? Is the CLR confidence interval narrower than the standard confidence interval? As it turns out, no:
. estat weakrobust, ci rseed(2024) Searching for CI bounds: Iteration 0: Grid points = 500 Iteration 1: Grid points = 1,000 (CI computed using 1,000 grid points on [-.050419, .061538]) Confidence interval robust to weak instruments Model VCE: Robust
CLR | ||
Interval | Coefficient [95% conf. interval] | |
price | ||
1 | .0055594 -inf -.0054904 | |
2 | .0007917 +inf | |
(Here I have set a random seed for reproducibility. Simulation is used to compute critical values for the CLR test when the model VCE is robust.)
In fact, the confidence interval for the coefficient on price is very wide and actually unbounded: (-∞, -0.00549) U (0.000792, ∞). The CLR test cannot statistically rule out either a positive or a negative effect. It does, however, provide evidence against price having zero effect.
The fact that the CLR test does not impose a normality approximation on the coefficient on price itself gives it its robustness property but also allows for this unusual result.
Notice that estat weakrobust performed inference using CLR by default, as opposed to Anderson–Rubin. This is because the model fit by ivregress is overidentified (it has more instruments than endogenous variables). We can request the Anderson–Rubin confidence interval for this model if we are interested in it. It is similar but wider:
. estat weakrobust, ci ar Searching for CI bounds: Iteration 0: Grid points = 500 Iteration 1: Grid points = 1,000 (CI computed using 1,000 grid points on [-.050419, .061538]) Confidence interval robust to weak instruments Model VCE: Robust
Anderson–Rubin | ||
Interval | Coefficient [95% conf. interval] | |
price | ||
1 | .0055594 -inf -.0027981 | |
2 | .0002308 +inf | |
We expect the Anderson–Rubin confidence interval to be wider than the CLR confidence interval because the Anderson–Rubin test is a less powerful test when the model is overidentified. When a model is just identified, the two tests are equivalent.
Bottom line
If you are using ivregress and suspect you have weak instruments, you can use estat weakrobust to get valid tests and confidence intervals for the endogenous variable. If instruments really are weak, you may run into unusual confidence intervals. Confidence intervals may contain infinite values or take the form of a union of intervals. In these cases, the Stata output of estat weakrobust will contain a hyperlink to a help page that explains how to interpret these intervals.
Read more
For further discussion of inference robust to weak instruments in Stata, see [R] ivregress postestimation.
References
Anderson, T. W., and H. Rubin. 1949. Estimation of the parameters of a single equation in a complete system of stochastic equations. Annals of Mathematical Statistics 20: 46–63. https://doi.org/10.1214/aoms/1177730090.
Andrews, I., J. H. Stock, and L. Sun. 2019. Weak instruments in instrumental variables regression: Theory and practice. Annual Review of Economics 11: 727–753. https://doi.org/10.1146/annurev-economics-080218-025643.
Moreira, M. J. 2003. A conditional likelihood ratio test for structural models. Econometrica 71: 1027–1048. https://doi.org/10.1111/1468-0262.00438.
— Tom Stringham
Senior Econometrician and Software Developer