In the spotlight: Robust inference
The workhorse of applied research is linear regression. To draw inferences based on the regression models you fit, you need to ensure that the methods for estimating standard errors or otherwise calculating confidence intervals and p-values are robust to violations of the i.i.d. assumption. For instance, if you are after an average treatment effect on the treated (ATET), you probably use difference-in-differences estimation (DID), which is a linear regression. If you want a heterogeneous ATET, you may use linear regression accounting for time and group interactions.
If the parameters you get are identified, how much you rely on them depends on your standard errors. Bertrand, Duflo, and Mullainathan (2004) made this point in the context of DID. They suggested that the right standard errors should be cluster–robust standard errors at the level at which the intervention occurred. Stata's DID commands (xtdidregress, xthdidregress, didregress, and hdidregress) all do this by default.
But when there are few clusters, cluster–robust standard errors are not reliable. For example, in many interventions, there are few treated and control groups. What should we do? Recently, many alternatives have arisen, and a lot of discussion about cluster–robust standard errors has occurred.
Below, I will talk about two alternatives implemented in Stata for cluster–robust inference. I use DID as a motivation, but everything I say applies for any linear regression model fit by regress, areg, or xtreg, fe, the most commonly used linear model estimators in Stata. I will treat data as a repeated cross-section, but all methods also apply to panel data.
HC2, an old friend
MacKinnon and White (1985) proposed a couple of estimators that improved the finite-sample performance of robust standard errors. HC2, in particular, has been available in regress for many years, but it is now available for cluster–robust standard errors with a degrees-of-freedom adjustment suggested by Bell and McCaffrey (2002). This cluster–robust version of HC2 works well when there are few clusters. Let's see it work in the context of DID:
. webuse smallg, clear (Simulated data with a small number of groups) . didregress (outcome x i.b) (treated), group(county) time(year) vce(hc2) Computing degrees of freedom ... Treatment and time information Time variable: year Control: treated = 0 Treatment: treated = 1
Control Treatment | ||
Group | ||
county | 4 2 | |
Time | ||
Minimum | 2011 2013 | |
Maximum | 2011 2013 | |
Robust HC2 | ||
outcome | Coefficient std. err. t P>|t| [95% conf. interval] | |
ATET | ||
treated | ||
(Treated | ||
vs | ||
Untreated) | -.9394987 .1278735 -7.35 0.020 -1.507314 -.3716835 | |
Here we needed to type only vce(hc2). By default, didregress thinks in terms of clusters and performs the degrees-of-freedom adjustment. If DID is your focus, I recommend using the dedicated commands such as didregress. But to illustrate how easy it is to obtain the same type of cluster–robust standard errors with other commands, you could fit a linear regression absorbing the intervention group, county, and type
. areg outcome x i.b i.year treated, absorb(county) vce(hc2 county, dfadjust)
and you would get an equivalent result.
If you had used regular cluster–robust standard errors in this case, you would have obtained smaller standard errors.
In Stata 18, you can use vce(hc2 clustervar, dfadjust) with regress, areg, or xtreg, fe to get more reliable inference when there are few clusters.
Wild–cluster bootstrap, a new friend
Another alternative is the wild–cluster bootstrap, which works well when there are few clusters, as documented in MacKinnon and Webb (2018). The wild–cluster bootstrap is a way to compute confidence intervals.
Below, I explore if there is a quadratic effect of age on the log of wages. I cluster at the industry level. For reproducibility, I include a seed:
. use https://www.stata-press.com/data/r18/nlswork, clear . wildbootstrap areg ln_wage c.age##c.age, absorb(idcode) cluster(ind_code) rseed(111) nolog Performing wild cluster bootstrap ... Wild cluster bootstrap Number of obs = 28,169 Linear regression, absorbing indicators Number of clusters = 12 Cluster size: Cluster variable: ind_code min = 52 Error weight: Rademacher avg = 2347.4 max = 8475
ln_wage | Estimate t p-value [95% conf. interval] | |
constraints | ||
age = 0 | .0542764 5.94 0.000 .0400869 .0957148 | |
c.age#c.age = 0 | -.0006028 -4.65 0.000 -.0010146 -.0004041 | |
The wildbootstrap command reports tests of the coefficients against 0. The estimate corresponds to the parameters from areg. We could have specified other tests using the areg coefficients. For our particular case, it seems that age has a quadratic effect on the log of wages.
If we had used areg, we would have obtained
. areg ln_wage c.age##c.age, absorb(idcode) vce(cluster ind_code) (output omitted) (Std. err. adjusted for 12 clusters in ind_code)
Robust | ||
ln_wage | Coefficient std. err. t P>|t| [95% conf. interval] | |
age | .0542764 .0091349 5.94 0.000 .0341708 .0743821 | |
c.age#c.age | -.0006028 .0001298 -4.65 0.001 -.0008884 -.0003173 | |
_cons | .634731 .1628045 3.90 0.002 .2764008 .9930613 | |
We reach the same conclusion, but our confidence intervals are now narrower and probably unreliably so.
wildbootstrap is available for regress, areg, and xtreg, fe.
But wait! There's more
When thinking about clustering, you may have multiple nonnested variables defining groups in which observations are not independent. In our wage example above, we could think about clustering at the industry and occupation code levels.
We would type
. areg ln_wage c.age##c.age, absorb(idcode) vce(cluster ind_code occ_code)
Conclusion
Stata 18 has three new inference tools that are tailored for cluster–robust inference. Two of them are tools for cases when there are few clusters. The third one helps you when there is nonnested multiway clustering.
You can read more about these methods in the Stata documentation; see [R] regress, [XT] xtreg, fe, [R] wildbootstrap, and, for a discussion in the DID context, [CAUSAL] DID intro.
References
Bell, R. M., and D. F. McCaffrey. 2002. Bias reduction in standard errors for linear regression with multi-stage samples. Survey Methodology 28: 169–181. https://www150.statcan.gc.ca/n1/pub/12-001-x/2002002/article/9058-eng.pdf.
Bertrand, M., E. Duflo, and S. Mullainathan. 2004. How much should we trust difference-in-differences estimates? Quarterly Journal of Economics 119: 249–275. https://doi.org/10.1162/003355304772839588.
MacKinnon, J. G., M. Ø. Nielsen, and M. D. Webb. 2023. Cluster-robust inference: A guide to empirical practice. Journal of Econometrics 232: 272–299. https://doi.org/10.1016/j.jeconom.2022.04.001.
MacKinnon, J. G., and M. D. Webb. 2018. The wildbootstrap for few (treated) clusters. Econometrics Journal 21: 114–135. https://doi.org/10.1111/ectj.12107.
MacKinnon, J. G., and H. L. White, Jr. 1985. Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties. Journal of Econometrics 29: 305–325. https://doi.org/10.1016/0304-4076(85)90158-7.
— Enrique Pinzón
Director, Econometrics