Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Problem with IV regression and two-way clustering

From	Austin Nichols <[email protected]>
To	[email protected]
Subject	Re: st: Problem with IV regression and two-way clustering
Date	Thu, 27 Sep 2012 16:03:38 -0400

Tobias Pfaff <[email protected]>:
Are individuals moving across regions? If not, the pid clustering is
subsumed in region, and you need only cluster at the region level.
You might consider 2-d clustering by region and year as well.
Clustering by pid is not enough; you have strong correlation of errors
and predictors within region across people.

On Thu, Sep 27, 2012 at 3:29 PM, Tobias Pfaff
<[email protected]> wrote:
> Dear Statalisters,
>
> I would kindly ask you for comments on an instrumental-variables regression
> with (two-way) clustered standard errors, which is a challenge for me.
> I'm afraid that the whole problem cannot be written in just a few lines.
> Below is the whole story (which is hopefully interesting to some of you).
>
> Any help is greatly appreciated!
>
> Now the setting:
>
> Unbalanced individual panel data set, single country
> Obs.: 170,000
> Individuals: 28,000
> Regions: 14
> Years: 9
> Dependent variable measured on the individual level
> Independent variable of interest (focusvar) measured on the regional level
> Further control variables: 10, all at the individual level, plus region and
> year dummies (20 dummies)
>
> I use individual fixed effects and I cluster on the individual level to
> control for correlation of the errors over time and get the result that my
> focus variable is significant:
> -xtivreg2 depvar focusvar controlvars, fe cluster(pid)-
>
> My focus variable is aggregated at a higher level (region) than the
> dependent variable (individual), and I know from Moulton (1990) that my
> standard errors can be biased downwards dramatically if I do not cluster at
> the regional level. Additionally, Donald and Lang (2007) say that without
> clustering on the regional level, I dramatically overstate the significance
> of the coefficients. Therefore, I use two-way clustering on the individual
> and on the regional level:
> -xtivreg2 depvar focusvar controlvars, fe cluster(pid region)-
>
> Now my focus variable is insignificant. However, the number of clusters is
> small (14), which again leads to biased results (Donald and Lang 2007).
> Cameron et al. (2011) tell me that "With a small number of clusters the
> cluster-robust standard errors are downwards biased" (p. 414). Since my
> focus variable is already insignificant, I would expect the coefficient to
> be even more insignificant, if I would correct for the bias induced by the
> small number of clusters, and I conclude that I find no evidence for
> significance.
>
> Now comes the challenge (as if it has not yet been enough):
> I want to do an IV regression to make sure that my results are not
> influenced by endogeneity bias. I found a variable on the regional level
> which is theoretically a fine instrument for my regional focus variable. The
> correlation between the focus variable and the instrument is .60.
>
> I now estimate the IV model with two-way clustered standard errors:
> -xtivreg2 depvar (focusvar = instrumentvar) controlvars, fe cluster(pid
> region) first-
>
> The size of the coefficient of my focus variable has decreased. The standard
> errors have increased drastically, and the coefficient is by far not
> significant. In the first-stage regression, the instrument is not
> significant. The tests say that the instrument is weak and I cannot reject
> the null of underidentification.  I interpret this as evidence that I have a
> bad instrument or that my focus variable is not endogenous.
>
> However, a different picture appears when I only cluster at the individual
> level:
> -xtivreg2 depvar (focusvar = instrumentvar) controlvars, fe cluster(pid)
> first-
>
> The standard errors of my focus variable are still much larger than the
> non-IV estimates, but smaller compared to IV with two-way clustering. The
> focus variable is again not significant. The instrument is highly
> significant in the first-stage regression. The tests indicate that the
> hypotheses of a weak instrument and of underidentification can be rejected.
> I would interpret this as evidence that my instrument is valid and that my
> focus variable is endogenous.
>
> Conclusion:
> My interpretation is that the results generally suggest that my focus
> variable is not significant.
>
> Open questions:
> Is my interpretation wrong?
> Is my instrument good or bad - should I trust the results from the one-way
> or two-way clustering for the IV approach?
> In case I want to cluster on the regional level and correct for the bias due
> to a small number of clusters, I could use wild-bootstrapping as proposed by
> Cameron et al. (2011), but does that work for IV as well?
>
> Thanks very much for any clarification,
> Tobias
>
> Cited literature:
> Cameron, Gelbach, Miller (2008), Bootstrap-Based Improvements for Inference
> with Clustered Errors. The Review of Economics and Statistics, 90 (3),
> 414-427.
> Donald, Lang (2007), Inference with Difference-in-Differences and Other
> Panel Data. The Review of Economics and Statistics, 89 (2), 221-233.
> Moulton (1990), An Illustration of a Pitfall in Estimating the Effects of
> Aggregate Variables on Micro Units. The Review of Economics and Statistics,
> 72 (2), 334-338.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Problem with IV regression and two-way clustering
  - From: "Tobias Pfaff" <[email protected]>

Prev by Date: st: Transform logit coef and use in -estout- -esttab-
Next by Date: st: meansdplot with if statement
Previous by thread: st: Problem with IV regression and two-way clustering
Next by thread: Re: st: Problem with IV regression and two-way clustering
Index(es):
- Date
- Thread