In the spotlight: Robust inference

The workhorse of applied research is linear regression. To draw inferences based on the regression models you fit, you need to ensure that the methods for estimating standard errors or otherwise calculating confidence intervals and p-values are robust to violations of the i.i.d. assumption. For instance, if you are after an average treatment effect on the treated (ATET), you probably use difference-in-differences estimation (DID), which is a linear regression. If you want a heterogeneous ATET, you may use linear regression accounting for time and group interactions.

If the parameters you get are identified, how much you rely on them depends on your standard errors. Bertrand, Duflo, and Mullainathan (2004) made this point in the context of DID. They suggested that the right standard errors should be cluster–robust standard errors at the level at which the intervention occurred. Stata's DID commands (xtdidregress, xthdidregress, didregress, and hdidregress) all do this by default.

But when there are few clusters, cluster–robust standard errors are not reliable. For example, in many interventions, there are few treated and control groups. What should we do? Recently, many alternatives have arisen, and a lot of discussion about cluster–robust standard errors has occurred.

Below, I will talk about two alternatives implemented in Stata for cluster–robust inference. I use DID as a motivation, but everything I say applies for any linear regression model fit by regress, areg, or xtreg, fe, the most commonly used linear model estimators in Stata. I will treat data as a repeated cross-section, but all methods also apply to panel data.

HC2, an old friend

MacKinnon and White (1985) proposed a couple of estimators that improved the finite-sample performance of robust standard errors. HC2, in particular, has been available in regress for many years, but it is now available for cluster–robust standard errors with a degrees-of-freedom adjustment suggested by Bell and McCaffrey (2002). This cluster–robust version of HC2 works well when there are few clusters. Let's see it work in the context of DID:

. webuse smallg, clear
(Simulated data with a small number of groups)

. didregress (outcome x i.b) (treated), group(county) time(year) vce(hc2)

Computing degrees of freedom ...

Treatment and time information

Time variable: year
Control:       treated = 0
Treatment:     treated = 1


                 Control  Treatment
   
Group                              
      county           4          2
   
Time                               
     Minimum        2011       2013
     Maximum        2011       2013


Difference-in-differences regression                  Number of obs   = 10,000
                                                      No. of clusters =      6
Data type: Repeated cross-sectional



                           Robust HC2                                         
     outcome   Coefficient  std. err.      t    P>|t|     [95% conf. interval]
   
ATET                                                                          
     treated                                                                  
   (Treated                                                                   
         vs                                                                   
 Untreated)     -.9394987   .1278735    -7.35   0.020    -1.507314   -.3716835

Note: ATET estimate adjusted for covariates, group effects, and time effects.

Here we needed to type only vce(hc2). By default, didregress thinks in terms of clusters and performs the degrees-of-freedom adjustment. If DID is your focus, I recommend using the dedicated commands such as didregress. But to illustrate how easy it is to obtain the same type of cluster–robust standard errors with other commands, you could fit a linear regression absorbing the intervention group, county, and type

. areg outcome x i.b i.year treated, absorb(county) vce(hc2 county, dfadjust)

and you would get an equivalent result.

If you had used regular cluster–robust standard errors in this case, you would have obtained smaller standard errors.

In Stata 18, you can use vce(hc2 clustervar, dfadjust) with regress, areg, or xtreg, fe to get more reliable inference when there are few clusters.

Wild–cluster bootstrap, a new friend

Another alternative is the wild–cluster bootstrap, which works well when there are few clusters, as documented in MacKinnon and Webb (2018). The wild–cluster bootstrap is a way to compute confidence intervals.

Below, I explore if there is a quadratic effect of age on the log of wages. I cluster at the industry level. For reproducibility, I include a seed:

. use https://www.stata-press.com/data/r18/nlswork, clear

. wildbootstrap areg ln_wage c.age##c.age, absorb(idcode)
                                             cluster(ind_code)
                                             rseed(111) nolog


Performing wild cluster bootstrap ...

Wild cluster bootstrap                            Number of obs      = 28,169
Linear regression, absorbing indicators           Number of clusters =     12
                                                  Cluster size:
Cluster variable: ind_code                                       min =     52
Error weight: Rademacher                                         avg = 2347.4
                                                                 max =   8475


                 ln_wage     Estimate      t  p-value    [95% conf. interval]
   
constraints                                                                  
                 age = 0     .0542764    5.94   0.000    .0400869    .0957148
         c.age#c.age = 0    -.0006028   -4.65   0.000   -.0010146   -.0004041

The wildbootstrap command reports tests of the coefficients against 0. The estimate corresponds to the parameters from areg. We could have specified other tests using the areg coefficients. For our particular case, it seems that age has a quadratic effect on the log of wages.

If we had used areg, we would have obtained

. areg ln_wage c.age##c.age, absorb(idcode) vce(cluster ind_code)
(output omitted)

                              (Std. err. adjusted for 12 clusters in ind_code)


                             Robust                                           
     ln_wage   Coefficient  std. err.      t    P>|t|     [95% conf. interval]
   
         age     .0542764   .0091349     5.94   0.000     .0341708    .0743821
                                                                              
 c.age#c.age    -.0006028   .0001298    -4.65   0.001    -.0008884   -.0003173
                                                                              
       _cons      .634731   .1628045     3.90   0.002     .2764008    .9930613

We reach the same conclusion, but our confidence intervals are now narrower and probably unreliably so.

wildbootstrap is available for regress, areg, and xtreg, fe.

But wait! There's more

When thinking about clustering, you may have multiple nonnested variables defining groups in which observations are not independent. In our wage example above, we could think about clustering at the industry and occupation code levels.

We would type

. areg ln_wage c.age##c.age, absorb(idcode) vce(cluster ind_code occ_code)

Conclusion

Stata 18 has three new inference tools that are tailored for cluster–robust inference. Two of them are tools for cases when there are few clusters. The third one helps you when there is nonnested multiway clustering.

You can read more about these methods in the Stata documentation; see [R] regress, [XT] xtreg, fe, [R] wildbootstrap, and, for a discussion in the DID context, [CAUSAL] DID intro.

References

Bell, R. M., and D. F. McCaffrey. 2002. Bias reduction in standard errors for linear regression with multi-stage samples. Survey Methodology 28: 169–181. https://www150.statcan.gc.ca/n1/pub/12-001-x/2002002/article/9058-eng.pdf.

Bertrand, M., E. Duflo, and S. Mullainathan. 2004. How much should we trust difference-in-differences estimates? Quarterly Journal of Economics 119: 249–275. https://doi.org/10.1162/003355304772839588.

MacKinnon, J. G., M. Ø. Nielsen, and M. D. Webb. 2023. Cluster-robust inference: A guide to empirical practice. Journal of Econometrics 232: 272–299. https://doi.org/10.1016/j.jeconom.2022.04.001.

MacKinnon, J. G., and M. D. Webb. 2018. The wildbootstrap for few (treated) clusters. Econometrics Journal 21: 114–135. https://doi.org/10.1111/ectj.12107.

MacKinnon, J. G., and H. L. White, Jr. 1985. Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties. Journal of Econometrics 29: 305–325. https://doi.org/10.1016/0304-4076(85)90158-7.

— Enrique Pinzón
Director, Econometrics

«Back to main page