Critical values for stock and yogo test when the clusteroption is used

Dear Professor Schaffer,

I am grateful to you for your quick reply. I read your paper (especially where it is related to my question - page 24) and i am alittle bit confused. I would be grateful if you can tell me if i am right. In your paper you write "If the user specifies the robust cluster options in ivreg2, the reproted weak instruments test statistics is a Wald F-statistic based on the Kleibergen-Paap rk statistic. Then you continue "In our view, however, the use of the rk wald statistic, as the robust analog of the cragg-donald statistics, is a sensible choice and clearly superior to the use of the latter in the presence of .. clustering".

If i understand this well i do not have to worry that my instrument is weak and maybe i wasn't clear enough in my previous mail. In order to be sure that i am right i will present my output (I only do not know why i specified the cluster option and still obtain the cragg-donald statistic and not the Kleibergen-Paap rk statistic as i see in the outputs of your paper).

Can i come to conclustion base on the output below? If yes, should i compare 158 to 16.38 or should i compare 5.26 to 16.38.

This is the output:

ivreg2 entitled gender fatheduc motheduc siblings tipuah index girlsprop olevatik fath_n_america_europe fath_asia_africa fath_otherloc moth_n_america_europe

moth_asia_africa school_dummy nstud12 nstud12squr ( pupils= ex_cs) if schl_iy==1 , first robust cluster( schlkita)
First-stage regressions

First-stage regression of pupils:

OLS estimation

Statistics robust to heteroskedasticity and clustering on schlkita

Number of clusters (schlkita) = 160 Number of obs = 3477
F( 17, 159) = 20.91
Prob > F = 0.0000
Total (centered) SS = 164186.2675 Centered R2 = 0.4071
Total (uncentered) SS = 3496104 Uncentered R2 = 0.9722
Residual SS = 97338.94342 Root MSE = 5.305

| Robust
pupils | Coef. Std. Err. t P>|t| [95% Conf. Interval]
gender | .1145447 .1273215 0.90 0.370 -.1369148 .3660043
fatheduc | .1756328 .0469734 3.74 0.000 .0828605 .2684051
motheduc | .1615434 .0685023 2.36 0.020 .0262517 .2968351
siblings | .1086323 .1211446 0.90 0.371 -.1306278 .3478923
tipuah | -.2036365 .1995934 -1.02 0.309 -.5978327 .1905598
index | .7451653 .6627006 1.12 0.263 -.5636659 2.053997
girlsprop | 2.874038 1.229628 2.34 0.021 .445527 5.302549
olevatik | -.1522312 .3756952 -0.41 0.686 -.8942278 .5897653
fath_n_ame~e | .0048261 .2376222 0.02 0.984 -.4644768 .4741289
fath_asia_~a | -.4640889 .2690778 -1.72 0.087 -.9955164 .0673387
fath_other~c | .558478 1.373149 0.41 0.685 -2.153487 3.270443
moth_n_ame~e | -.3222788 .250198 -1.29 0.200 -.8164188 .1718613
moth_asia_~a | -.0316187 .3215067 -0.10 0.922 -.6665932 .6033559
school_dummy | -.2655496 1.621763 -0.16 0.870 -3.468526 2.937427
nstud12 | .0284393 .0381495 0.75 0.457 -.0469059 .1037845
nstud12squr | -.0000321 .0000744 -0.43 0.667 -.0001791 .000115
ex_cs | .4494619 .1959078 2.29 0.023 .0625446 .8363791
_cons | 7.021915 4.437086 1.58 0.116 -1.741313 15.78514
Included instruments: gender fatheduc motheduc siblings tipuah index girlsprop
olevatik fath_n_america_europe fath_asia_africa
fath_otherloc moth_n_america_europe moth_asia_africa
school_dummy nstud12 nstud12squr ex_cs
Partial R-squared of excluded instruments: 0.0437
Test of excluded instruments:
F( 1, 159) = 5.26
Prob > F = 0.0231

Summary results for first-stage regressions

Variable | Shea Partial R2 | Partial R2 | F( 1, 159) P-value
pupils | 0.0437 | 0.0437 | 5.26 0.0231

NB: first-stage F-stat cluster-robust

Underidentification tests:
Chi-sq(1) P-value
Anderson canon. corr. likelihood ratio stat. 155.36 0.0000
Cragg-Donald N*minEval stat. 158.88 0.0000
Ho: matrix of reduced form coefficients has rank=K-1 (underidentified)
Ha: matrix has rank>=K (identified)

NB: underidentification statistics not robust

Anderson-Rubin test of joint significance of
endogenous regressors B1 in main equation, Ho:B1=0
F(1,159)= 5.90 P-val=0.0162
Chi-sq(1)= 5.97 P-val=0.0145
NB: Anderson-Rubin stat cluster-robust

Number of clusters N_clust = 160
Number of observations N = 3477
Number of regressors K = 18
Number of instruments L = 18
Number of excluded instruments L2 = 1

IV (2SLS) estimation

Statistics robust to heteroskedasticity and clustering on schlkita

Number of clusters (schlkita) = 160 Number of obs = 3477
F( 17, 159) = 4.64
Prob > F = 0.0000
Total (centered) SS = 535.3396606 Centered R2 = -0.2609
Total (uncentered) SS = 2816 Uncentered R2 = 0.7603
Residual SS = 675.0333424 Root MSE = .4406

| Robust
entitled | Coef. Std. Err. z P>|z| [95% Conf. Interval]
pupils | -.0364681 .0175529 -2.08 0.038 -.0708711 -.0020652
gender | .0492141 .0163051 3.02 0.003 .0172566 .0811716
fatheduc | .0141794 .0044121 3.21 0.001 .0055319 .0228269
motheduc | .0150479 .0057364 2.62 0.009 .0038048 .026291
siblings | .0088987 .0082991 1.07 0.284 -.0073673 .0251646
tipuah | -.0272665 .0116795 -2.33 0.020 -.050158 -.0043751
index | .0514334 .0409498 1.26 0.209 -.0288267 .1316935
girlsprop | .1473534 .061805 2.38 0.017 .0262178 .268489
olevatik | -.0408316 .0310203 -1.32 0.188 -.1016302 .019967
fath_n_ame~e | -.0009037 .0204988 -0.04 0.965 -.0410805 .0392731
fath_asia_~a | -.0435657 .0256591 -1.70 0.090 -.0938566 .0067252
fath_other~c | -.1169402 .146099 -0.80 0.423 -.4032889 .1694085
moth_n_ame~e | -.0277134 .0206725 -1.34 0.180 -.0682307 .0128039
moth_asia_~a | -.0140052 .0275814 -0.51 0.612 -.0680638 .0400534
school_dummy | .1578893 .0780717 2.02 0.043 .0048717 .310907
nstud12 | .0047183 .0020431 2.31 0.021 .0007138 .0087228
nstud12squr | -7.48e-06 3.84e-06 -1.95 0.052 -.000015 5.02e-08
_cons | .8993509 .3155496 2.85 0.004 .2808851 1.517817
Anderson canon. cor. LR statistic (identification/IV relevance test): 155.358
Chi-sq(1) P-val = 0.0000
Test statistic(s) not robust
Cragg-Donald F statistic (weak identification test): 158.058
Stock-Yogo weak ID test critical values: 10% maximal IV size 16.38
15% maximal IV size 8.96
20% maximal IV size 6.66
25% maximal IV size 5.53
Test statistic(s) not robust
Source: Stock-Yogo (2005). Reproduced by permission.
Hansen J statistic (overidentification test of all instruments): 0.000
(equation exactly identified)
Instrumented: pupils
Included instruments: gender fatheduc motheduc siblings tipuah index girlsprop
olevatik fath_n_america_europe fath_asia_africa
fath_otherloc moth_n_america_europe moth_asia_africa
school_dummy nstud12 nstud12squr
Excluded instruments: ex_cs

Dear Professor Austin and other statalist subscribers

I only want to add that in the email today I clarified
something that wasn't clear enough in the private mail
yesterday. The points is that the results much differ if I
use the clustered f-statistic instead of the regular one.
The regular f-statistic is 158 which is extremely above the
regular critical value (the critical values are between
5.53-16.38). This indicate that the instrument is extremely
strong. However, if i compare the clustered f-statistic
(5.26) to the regular critical values I may come to an
opposite conclusion that my instrument is quite weak (it is
only close to the 25% maximal iv size). Should i worry about
the strength of my instrument?
The short answer is "yes", you should worry, and your intution below is

The problem starts out just like the usual problem of using non-robust
SEs for inference.  If the disturbance is heteroskedastic or clustered,
the usual SEs are wrong, and usually [sic] you'll get SEs that are "too
small" and test stats that are "too big".  The same features of the data
that give you these test stats that are "too big" will give you a
first-stage F-stat that is also "too big", and so the Stock-Yogo
critical values will be wrong.

Where it gets complicated is that the S-Y critical values for weak
identification come from Monte Carlos.  My understanding is that if you
want the "right" critical values for cases where the standard S-Y iid
assumption is loosened, you have to specify how it's loosened, i.e., in
your case, what kind of clustering you've got.

That said, using a cluster-robust first-stage F stat with the S-Y
critical values is not bad and in the absence of anything better is
probably the best you can do.

It's also worth noting that the first-stage F-stat can also be used as a
test for *under*identification a la Anderson.  (See the paper by myself,
Kit Baum and Steve Stillman in the latest issue of the SJ, vol. 7 no. 4
2007.)  This is a test with an asymptotic justification and uses
standard critical values, and the robust first-stage F stat with this
standard critical values is fine.  Since underidentification is a lower
hurdle than weak identification, if you can't reject the null that your
equation is underidentified, you can pretty safely also fail to reject
the null that it's weakly identified.

Hope this helps.


After all, i guess that as the
clustered f-statistic is much lower than the regular one, the
clustered critical value should also be lower? That is, don't
we require to much when we compare the clustered f-statistic
to the regular critical values.



