Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Tobias Pfaff" <tobias.pfaff@uni-muenster.de> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | st: Intraclass correlation coefficient biased if number of clusters is small? |
Date | Thu, 6 Dec 2012 09:43:29 +0100 |
Dear Statalisters, Can I trust estimates of the intraclass correlation coefficient (ICC) when the number of clusters/groups is small? I need the answer for a setting where the adjustment of standard errors is recommended for a regression with a dependent variable at the individual level and the key regressor aggregated at the regional level, while the number of clusters is small. EMPIRICS: ********* Regions (clusters) = 6 Observations = 80,000 -regress indiv_depvar regional_indepvar micro_indepvars- -predict res, resid- -loneway res region- => rho_e (ICC of the residuals) = 0.73837 Now with region dummies: -regress indiv_depvar regional_indepvar micro_indepvars i.region- (...) => rho_e = 0 (the same happens when I use Moulton's formula for intraclass correlation with Steve Pischke's moulton.ado, http://economics.mit.edu/faculty/angrist/data1/mhe/brl) (and it happens with nested data as well as with non-nested data) My number of clusters is small (=6). So I would normally assume that my standard errors exhibit a downward bias. To correct for the downward bias I could use a parametric correction with the Moulton factor. However, if the ICC of the residuals is zero, the Moulton factor is 1, which means that my standard errors are multiplied with 1, and are effectively not corrected. THEORY: (details on the literature below) ******* Angrist & Pischke (2009) suggest a parametric correction with the Moulton factor as one option if the number of clusters is small (p. 322). However, Feng et al. (2001) say that "rho is typically estimable only poorly in GRTs" (Feng et al., p. 169) [GRT = group-randomized trials]. Further, "(...) because in most GRTs the number of groups is relatively small, the estimate of the between-group variance, sigma_between^2, has small df and a large standard error. (...) ignoring the lack of precision with which the ICC (...) is estimated can also lead to incorrect results: underestimating rho, or using a wrong df (often too big) in testing the intervention effect." NOW, WHAT SHOULD BE MY CONCLUSION? a) I include region dummies, rho_e = 0 tells me that there is no ICC of the residuals, I conclude that my standard errors are not downward biased, and I don't do any correction of the standard errors, or b) I cannot trust the estimation of the ICC in a setting with small number of clusters, and need to apply another adjustment method like wild bootstrap, and c) Does that mean that the parametric correction with the Moulton factor is eventually not suitable for settings where the number of clusters is small, because rho cannot be estimated correctly? It's just strange since Angrist & Pischke (2009) suggest this method for few cluster settings, while Angrist & Lavy (2009) warn that "parametric cluster adjustments [are] too optimistic" (p. 1392), citing Feng et al. (2001). But maybe I missed something. Thanks very much for any comments. Regards, Tobias Angrist & Lavy (2009), The effects of high stakes high school achievement awards: evidence from a randomized trial, The American Economic Review, 99(4), 1384-1414. Angrist & Pischke (2009), Mostly Harmless Econometrics, Princeton University Press. Feng, Ziding, P. Diehr, A. Peterson, and D. McLerran (2001), Selected Statistical issues in Group Randomized Trials, Annual Review of Public Health, 22, 167-87. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/