Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Problem with IV regression and two-way clustering


From   "Schaffer, Mark E" <[email protected]>
To   <[email protected]>
Subject   RE: st: Problem with IV regression and two-way clustering
Date   Fri, 28 Sep 2012 16:15:13 +0100

Tobias,

The cluster-robust approach is nonparametric in the sense that the VCE
is robust to arbitrary within-cluster correlation.  That's fine if
you've got enough clusters to be reasonably happy that the asymptotics
kick in, but I don't think you do.

A parametric approach means that instead of allowing for arbitrary
within-cluster correlation, you model and estimate it.  In your case,
for example, you might estimate the intra-class correlations and then
use the "Moulton factor" (a.k.a. the "design effect") to adjust the SEs.

Angrist & Pischke's Mostly Harmless Econometrics (2009, chapter 8) has a
good discussion.  Steve Pischke's website has an ungated extract here:
http://econ.lse.ac.uk/staff/spischke/mhe/ex_ch8.pdf

HTH,
Mark

> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of 
> Tobias Pfaff
> Sent: Friday, September 28, 2012 3:11 PM
> To: [email protected]
> Subject: RE: st: Problem with IV regression and two-way clustering
> 
> Thanks Mark.
> But what do you mean by "parametric approach"?
> 
> Regards,
> Tobias
> 
> 
> > -----Original Message-----
> > From: [email protected] 
> > [mailto:[email protected]] "Schaffer, Mark E"
> <[email protected]>
> > Sent: Fri, 28 Sep 2012 12:23:38 +0100
> > To: [email protected]
> > Subject: Re: st: Problem with IV regression and two-way clustering
> 
> > Tobias,
> 
> > My reaction is that 14 clusters is too small.  Consistency of the
> > cluster-robust VCE requires the number of clusters to go to 
> infinity,
> > and 14 is just not very far on the way to infinity.  You 
> note that with
> > a small number of clusters, the SEs are biased downwards, but the
> > problem isn't just bias - you are going to get noisy 
> estimates of the
> > SEs, i.e., in repeated samples with 14 clusters they can be 
> all over the
> > place.
> 
> > You might instead want to investigate a parametric approach to the
> > problem...?
> 
> > HTH,
> > Mark
> 
> > -----Original Message-----
> > From: [email protected] 
> > [mailto:[email protected]] On Behalf Of 
> > Tobias Pfaff
> > Sent: Thursday, September 27, 2012 9:30 PM
> > To: [email protected]
> > Subject: Re: st: Problem with IV regression and two-way clustering
> > 
> > Dear Austin,
> > 
> > Yes, some individuals move across regions.
> > If I do the IV regression with two-way clustering, I just 
> > find it strange
> > that the tests point to an invalid instrument, given the rather high
> > correlation of the focus variable and the instrument.
> > 
> > Regards,
> > Tobias
> > 
> > ________________________________________
> > From Austin Nichols <[email protected]>
> > To [email protected]
> > Subject Re: st: Problem with IV regression and two-way clustering
> > Date Thu, 27 Sep 2012 16:03:38 -0400
> > ________________________________________
> > 
> > Are individuals moving across regions? If not, the pid clustering is
> > subsumed in region, and you need only cluster at the region level.
> > You might consider 2-d clustering by region and year as well.
> > Clustering by pid is not enough; you have strong 
> correlation of errors
> > and predictors within region across people.
> > 
> > On Thu, Sep 27, 2012 at 3:29 PM, Tobias Pfaff
> > <[email protected]> wrote:
> > > Dear Statalisters,
> > >
> > > I would kindly ask you for comments on an instrumental-variables
> > regression
> > > with (two-way) clustered standard errors, which is a 
> > challenge for me.
> > > I'm afraid that the whole problem cannot be written in just 
> > a few lines.
> > > Below is the whole story (which is hopefully interesting to 
> > some of you).
> > >
> > > Any help is greatly appreciated!
> > >
> > > Now the setting:
> > >
> > > Unbalanced individual panel data set, single country
> > > Obs.: 170,000
> > > Individuals: 28,000
> > > Regions: 14
> > > Years: 9
> > > Dependent variable measured on the individual level
> > > Independent variable of interest (focusvar) measured on the 
> > regional level
> > > Further control variables: 10, all at the individual level, 
> > plus region
> > and
> > > year dummies (20 dummies)
> > >
> > > I use individual fixed effects and I cluster on the 
> > individual level to
> > > control for correlation of the errors over time and get the 
> > result that my
> > > focus variable is significant:
> > > -xtivreg2 depvar focusvar controlvars, fe cluster(pid)-
> > >
> > > My focus variable is aggregated at a higher level 
> (region) than the
> > > dependent variable (individual), and I know from Moulton 
> > (1990) that my
> > > standard errors can be biased downwards dramatically if I 
> > do not cluster
> > at
> > > the regional level. Additionally, Donald and Lang (2007) 
> > say that without
> > > clustering on the regional level, I dramatically overstate the
> > significance
> > > of the coefficients. Therefore, I use two-way clustering on 
> > the individual
> > > and on the regional level:
> > > -xtivreg2 depvar focusvar controlvars, fe cluster(pid region)-
> > >
> > > Now my focus variable is insignificant. However, the number 
> > of clusters is
> > > small (14), which again leads to biased results (Donald and 
> > Lang 2007).
> > > Cameron et al. (2011) tell me that "With a small number of 
> > clusters the
> > > cluster-robust standard errors are downwards biased" (p. 
> > 414). Since my
> > > focus variable is already insignificant, I would expect the 
> > coefficient to
> > > be even more insignificant, if I would correct for the bias 
> > induced by the
> > > small number of clusters, and I conclude that I find no 
> evidence for
> > > significance.
> > >
> > > Now comes the challenge (as if it has not yet been enough):
> > > I want to do an IV regression to make sure that my results are not
> > > influenced by endogeneity bias. I found a variable on the 
> > regional level
> > > which is theoretically a fine instrument for my regional 
> > focus variable.
> > The
> > > correlation between the focus variable and the instrument is .60.
> > >
> > > I now estimate the IV model with two-way clustered 
> standard errors:
> > > -xtivreg2 depvar (focusvar = instrumentvar) controlvars, fe 
> > cluster(pid
> > > region) first-
> > >
> > > The size of the coefficient of my focus variable has 
> decreased. The
> > standard
> > > errors have increased drastically, and the coefficient is 
> by far not
> > > significant. In the first-stage regression, the instrument is not
> > > significant. The tests say that the instrument is weak and 
> > I cannot reject
> > > the null of underidentification.  I interpret this as 
> > evidence that I have
> > a
> > > bad instrument or that my focus variable is not endogenous.
> > >
> > > However, a different picture appears when I only cluster at 
> > the individual
> > > level:
> > > -xtivreg2 depvar (focusvar = instrumentvar) controlvars, fe 
> > cluster(pid)
> > > first-
> > >
> > > The standard errors of my focus variable are still much 
> > larger than the
> > > non-IV estimates, but smaller compared to IV with two-way 
> > clustering. The
> > > focus variable is again not significant. The instrument is highly
> > > significant in the first-stage regression. The tests 
> > indicate that the
> > > hypotheses of a weak instrument and of underidentification can be
> > rejected.
> > > I would interpret this as evidence that my instrument is 
> > valid and that my
> > > focus variable is endogenous.
> > >
> > > Conclusion:
> > > My interpretation is that the results generally suggest 
> > that my focus
> > > variable is not significant.
> > >
> > > Open questions:
> > > Is my interpretation wrong?
> > > Is my instrument good or bad - should I trust the results 
> > from the one-way
> > > or two-way clustering for the IV approach?
> > > In case I want to cluster on the regional level and correct 
> > for the bias
> > due
> > > to a small number of clusters, I could use 
> > wild-bootstrapping as proposed
> > by
> > > Cameron et al. (2011), but does that work for IV as well?
> > >
> > > Thanks very much for any clarification,
> > > Tobias
> > >
> > > Cited literature:
> > > Cameron, Gelbach, Miller (2008), Bootstrap-Based Improvements for
> > Inference
> > > with Clustered Errors. The Review of Economics and 
> > Statistics, 90 (3),
> > > 414-427.
> > > Donald, Lang (2007), Inference with 
> > Difference-in-Differences and Other
> > > Panel Data. The Review of Economics and Statistics, 89 
> (2), 221-233.
> > > Moulton (1990), An Illustration of a Pitfall in Estimating 
> > the Effects of
> > > Aggregate Variables on Micro Units. The Review of Economics and
> > Statistics,
> > > 72 (2), 334-338.
> > 
> > 
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/help.cgi?search
> > *   http://www.stata.com/support/faqs/resources/statalist-faq/
> > *   http://www.ats.ucla.edu/stat/stata/
> > 
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
> 


-- 
Heriot-Watt University is the Sunday Times
Scottish University of the Year 2011-2012

We invite research leaders and ambitious early career researchers to 
join us in leading and driving research in key inter-disciplinary themes. 
Please see www.hw.ac.uk/researchleaders for further information and how
to apply.

Heriot-Watt University is a Scottish charity
registered under charity number SC000278.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index