Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Austin Nichols <austinnichols@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Problem with IV regression and two-way clustering |
Date | Thu, 27 Sep 2012 16:03:38 -0400 |
Tobias Pfaff <tobias.pfaff@uni-muenster.de>: Are individuals moving across regions? If not, the pid clustering is subsumed in region, and you need only cluster at the region level. You might consider 2-d clustering by region and year as well. Clustering by pid is not enough; you have strong correlation of errors and predictors within region across people. On Thu, Sep 27, 2012 at 3:29 PM, Tobias Pfaff <tobias.pfaff@uni-muenster.de> wrote: > Dear Statalisters, > > I would kindly ask you for comments on an instrumental-variables regression > with (two-way) clustered standard errors, which is a challenge for me. > I'm afraid that the whole problem cannot be written in just a few lines. > Below is the whole story (which is hopefully interesting to some of you). > > Any help is greatly appreciated! > > Now the setting: > > Unbalanced individual panel data set, single country > Obs.: 170,000 > Individuals: 28,000 > Regions: 14 > Years: 9 > Dependent variable measured on the individual level > Independent variable of interest (focusvar) measured on the regional level > Further control variables: 10, all at the individual level, plus region and > year dummies (20 dummies) > > I use individual fixed effects and I cluster on the individual level to > control for correlation of the errors over time and get the result that my > focus variable is significant: > -xtivreg2 depvar focusvar controlvars, fe cluster(pid)- > > My focus variable is aggregated at a higher level (region) than the > dependent variable (individual), and I know from Moulton (1990) that my > standard errors can be biased downwards dramatically if I do not cluster at > the regional level. Additionally, Donald and Lang (2007) say that without > clustering on the regional level, I dramatically overstate the significance > of the coefficients. Therefore, I use two-way clustering on the individual > and on the regional level: > -xtivreg2 depvar focusvar controlvars, fe cluster(pid region)- > > Now my focus variable is insignificant. However, the number of clusters is > small (14), which again leads to biased results (Donald and Lang 2007). > Cameron et al. (2011) tell me that "With a small number of clusters the > cluster-robust standard errors are downwards biased" (p. 414). Since my > focus variable is already insignificant, I would expect the coefficient to > be even more insignificant, if I would correct for the bias induced by the > small number of clusters, and I conclude that I find no evidence for > significance. > > Now comes the challenge (as if it has not yet been enough): > I want to do an IV regression to make sure that my results are not > influenced by endogeneity bias. I found a variable on the regional level > which is theoretically a fine instrument for my regional focus variable. The > correlation between the focus variable and the instrument is .60. > > I now estimate the IV model with two-way clustered standard errors: > -xtivreg2 depvar (focusvar = instrumentvar) controlvars, fe cluster(pid > region) first- > > The size of the coefficient of my focus variable has decreased. The standard > errors have increased drastically, and the coefficient is by far not > significant. In the first-stage regression, the instrument is not > significant. The tests say that the instrument is weak and I cannot reject > the null of underidentification. I interpret this as evidence that I have a > bad instrument or that my focus variable is not endogenous. > > However, a different picture appears when I only cluster at the individual > level: > -xtivreg2 depvar (focusvar = instrumentvar) controlvars, fe cluster(pid) > first- > > The standard errors of my focus variable are still much larger than the > non-IV estimates, but smaller compared to IV with two-way clustering. The > focus variable is again not significant. The instrument is highly > significant in the first-stage regression. The tests indicate that the > hypotheses of a weak instrument and of underidentification can be rejected. > I would interpret this as evidence that my instrument is valid and that my > focus variable is endogenous. > > Conclusion: > My interpretation is that the results generally suggest that my focus > variable is not significant. > > Open questions: > Is my interpretation wrong? > Is my instrument good or bad - should I trust the results from the one-way > or two-way clustering for the IV approach? > In case I want to cluster on the regional level and correct for the bias due > to a small number of clusters, I could use wild-bootstrapping as proposed by > Cameron et al. (2011), but does that work for IV as well? > > Thanks very much for any clarification, > Tobias > > Cited literature: > Cameron, Gelbach, Miller (2008), Bootstrap-Based Improvements for Inference > with Clustered Errors. The Review of Economics and Statistics, 90 (3), > 414-427. > Donald, Lang (2007), Inference with Difference-in-Differences and Other > Panel Data. The Review of Economics and Statistics, 89 (2), 221-233. > Moulton (1990), An Illustration of a Pitfall in Estimating the Effects of > Aggregate Variables on Micro Units. The Review of Economics and Statistics, > 72 (2), 334-338. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/