Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: ivreg2 - weak instruments?
From
Phil <[email protected]>
To
[email protected]
Subject
st: ivreg2 - weak instruments?
Date
Thu, 29 Apr 2010 13:22:38 +0200
Dear statalist,
I estimate two model specifications where I suspect a RHS variable to
be endogenous because of reserve-causality. I test this assumption by
instrumental variable estimations, where I use two excluded
instruments for the potentially endogenous variable. Both model
specifications differ in the way that in model 1 the potentially
endogenous variable enters the equation only once while in model 2 it
enters the equation also as squared and interaction term.
Model 1 looks as follows:
Y = b bx1 + bx2 + bz + u
Where x1 is the potentially endogenous variable, x2 is an exogenous
variable und z are the instruments.
I estimate this model with ivreg2 and tests (F-test, partial
R-squared, Kleibergen-Paap, Hansen’s J, endog option) indicate that
the instruments are valid and necessary (see below for output).
Model 2 is the following:
Y = b bx1 + bx2 + bx1^2 + bx1*bx2 + bz + u
In this model there are three potentially endogenous variables because
of a squared and an interaction term of the potentially endogenous
variable x1 (bx1, bx1^2, bx1*bx2). I calculated the instruments
accordingly, i.e. taking their squares and interacting them with x2. I
found that including all six of these instruments renders the
overidentification test to fail. However, when I include four of the
instruments, the overidentification test holds. In this case Shea’s
partial R-squared for the three potentially endogenous variables
ranges from 0.3, over 0.6 to 0.9 (see output below). What confuses me
is that the identification test (Kleibergen-Paap) fails to hold
dramatically in the second model showing P-values of 1.0 while the
other tests for the instruments look okay. Moreover, these instruments
worked well in model 1 with only one endogenous variable. How should I
interpret these results, is the Kleibergen-Paap test valid with
multiple endogenous variables and does it mean that the instruments
are weak in model 2?
I would appreciate any help in this matter.
Best
Phil
Model 1
. ivreg2 fas3 sizei sizesqi skilli4 sizei_skill4i invcj tcj2
skill4sqi_tcj2 tci2 dist argentina australia austria belgium
> brazil bulgaria canada chile chinamainland czechrepublic denmark finland france germany greece hongkong hungary irela
> nd italy korea malaysia mexico netherlands newzealand norway philippines poland portugal romania russia singapore sp
> ain sweden switzerland taiwan thailand turkey usa argentina2 australia2 austria2 belgium2 brazil2 bulgaria2 canada2 chi
> le2 chinamainland2 czechrepublic2 denmark2 finland2 france2 germany2 greece2 hongkong2 hungary2 ireland2 italy2 korea2
> malaysia2 mexico2 netherlands2 newzealand2 norway2 philippines2 poland2 portugal2 romania2 russia2 singapore2 spain2 s
> weden2 switzerland2 taiwan2 thailand2 turkey2 usa2 _2001 _2002 _2003 _2004 _2005 (sumgdp = inst sumlat2), robust first
> endogtest(sumgdp)
First-stage regressions
-----------------------
First-stage regression of sumgdp:
OLS estimation
--------------
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity
Number of obs = 4366
F( 92, 4273) = 57315.23
Prob > F = 0.0000
Total (centered) SS = 3.93926e+16 Centered R2 = 0.9996
Total (uncentered) SS = 5.88246e+16 Uncentered R2 = 0.9997
Residual SS = 1.75151e+13 Root MSE = 64024
------------------------------------------------------------------------------
| Robust
sumgdp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
sizei | 96125.36 25162.36 3.82 0.000 46794.06 145456.7
sizesqi | -66612.46 10634.54 -6.26 0.000 -87461.68 -45763.23
skilli4 | 14291.49 7066.856 2.02 0.043 436.784 28146.2
sizei_ski~4i | 13911.5 15459.9 0.90 0.368 -16397.94 44220.95
invcj | -27860.7 5149.074 -5.41 0.000 -37955.56 -17765.84
tcj2 | -26531.13 2912.961 -9.11 0.000 -32242.04 -20820.21
skill4sqi_~2 | -119.5441 137.8151 -0.87 0.386 -389.7333 150.6452
tci2 | -37091.49 2328.54 -15.93 0.000 -41656.64 -32526.35
dist | 6.129196 1.03451 5.92 0.000 4.101018 8.157373
….
inst | .8927053 .0091796 97.25 0.000 .8747086 .910702
sumlat2 | 16156 939.7222 17.19 0.000 14313.66 17998.34
_cons | -885762.3 86184.74 -10.28 0.000 -1054729 -716795.4
------------------------------------------------------------------------------
Included instruments: sizei sizesqi skilli4 sizei_skill4i invcj tcj2
------------------------------------------------------------------------------
Partial R-squared of excluded instruments: 0.9370
Test of excluded instruments:
F( 2, 4273) = 10904.41
Prob > F = 0.0000
Summary results for first-stage regressions
-------------------------------------------
Variable | Shea Partial R2 | Partial R2 | F( 2, 4273) P-value
sumgdp | 0.9370 | 0.9370 | 10904.41 0.0000
NB: first-stage F-stat heteroskedasticity-robust
Underidentification tests
Ho: matrix of reduced form coefficients has rank=K1-1 (underidentified)
Ha: matrix has rank=K1 (identified)
Kleibergen-Paap rk LM statistic Chi-sq(2)=1019.26 P-val=0.0000
Kleibergen-Paap rk Wald statistic Chi-sq(2)=22283.48 P-val=0.0000
Weak identification test
Ho: equation is weakly identified
Kleibergen-Paap Wald rk F statistic 10904.41
See main output for Cragg-Donald weak id test critical values
Weak-instrument-robust inference
Tests of joint significance of endogenous regressors B1 in main equation
Ho: B1=0 and overidentifying restrictions are valid
Anderson-Rubin Wald test F(2,4273)=21.41 P-val=0.0000
Anderson-Rubin Wald test Chi-sq(2)=43.74 P-val=0.0000
Stock-Wright LM S statistic Chi-sq(2)=39.38 P-val=0.0000
NB: Underidentification, weak identification and weak-identification-robust
test statistics heteroskedasticity-robust
Number of observations N = 4366
Number of regressors K = 92
Number of instruments L = 93
Number of excluded instruments L1 = 2
IV (2SLS) estimation
--------------------
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity
Number of obs = 4366
F( 91, 4274) = 13.81
Prob > F = 0.0000
Total (centered) SS = 2.40966e+12 Centered R2 = 0.4117
Total (uncentered) SS = 2.64194e+12 Uncentered R2 = 0.4635
Residual SS = 1.41751e+12 Root MSE = 18019
------------------------------------------------------------------------------
| Robust
fas3 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
sumgdp | .0149349 .0023039 6.48 0.000 .0104194 .0194505
sizei | 69406.44 7477.315 9.28 0.000 54751.17 84061.71
sizesqi | -35124.18 3799.468 -9.24 0.000 -42571 -27677.36
skilli4 | 23967.22 2744.369 8.73 0.000 18588.36 29346.09
sizei_ski~4i | -26711.56 3870.27 -6.90 0.000 -34297.16 -19125.97
invcj | -2308.65 1534.638 -1.50 0.132 -5316.484 699.1849
tcj2 | 1914.868 618.6698 3.10 0.002 702.2977 3127.439
skill4sqi_~2 | -306.3677 46.95296 -6.52 0.000 -398.3938 -214.3416
tci2 | -1620.602 611.842 -2.65 0.008 -2819.79 -421.4134
dist | -2.040303 .2484136 -8.21 0.000 -2.527184 -1.553421
….
_cons | -21100.4 8803.58 -2.40 0.017 -38355.1 -3845.697
------------------------------------------------------------------------------
Underidentification test (Kleibergen-Paap rk LM statistic): 1019.262
Chi-sq(2) P-val = 0.0000
------------------------------------------------------------------------------
Weak identification test (Kleibergen-Paap rk Wald F statistic): 1.1e+04
Stock-Yogo weak ID test critical values: 10% maximal IV size 19.93
15% maximal IV size 11.59
20% maximal IV size 8.75
25% maximal IV size 7.25
Source: Stock-Yogo (2005). Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
------------------------------------------------------------------------------
Hansen J statistic (overidentification test of all instruments): 0.014
Chi-sq(1) P-val = 0.9046
-endog- option:
Endogeneity test of endogenous regressors: 7.962
Chi-sq(1) P-val = 0.0048
Regressors tested: sumgdp
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Model 2
. ivreg2 fas3 skdiff4 invcj tcj2 tcj2xskdiffsq4 tci2 dist argentina
australia austria belgium brazil bulgaria canada chil
> e chinamainland czechrepublic denmark finland france germany greece hongkong hungary ireland italy korea malaysia mexic
> o netherlands newzealand norway philippines poland portugal romania russia singapore spain sweden switzerland taiwan
> thailand turkey usa argentina2 australia2 austria2 belgium2 brazil2 bulgaria2 canada2 chile2 chinamainland2 czechrepu
> blic2 denmark2 finland2 france2 germany2 greece2 hongkong2 hungary2 ireland2 italy2 korea2 malaysia2 mexico2 netherland
> s2 newzealand2 norway2 philippines2 poland2 portugal2 romania2 russia2 singapore2 spain2 sweden2 switzerland2 taiwan2 t
> hailand2 turkey2 usa2 _2001 _2002 _2003 _2004 _2005 (sumgdp gdpdiffsq gdpdiffxskdiff4 = sumlat2 latdiffsq2 instdiffsq i
> nstdiffxskdiff4), endog(sumgdp gdpdiffsq gdpdiffxskdiff4) first robust
First-stage regressions
-----------------------
First-stage regression of sumgdp:
OLS estimation
--------------
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity
Number of obs = 4366
F( 91, 4274) = 713.56
Prob > F = 0.0000
Total (centered) SS = 3.93926e+16 Centered R2 = 0.9957
Total (uncentered) SS = 5.88246e+16 Uncentered R2 = 0.9971
Residual SS = 1.67917e+14 Root MSE = 198212
------------------------------------------------------------------------------
| Robust
sumgdp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
skdiff4 | 168531.2 55878.43 3.02 0.003 58980.48 278082
invcj | -127174.4 16629.73 -7.65 0.000 -159777.3 -94571.53
tcj2 | -116866.4 8854.973 -13.20 0.000 -134226.7 -99506.05
tcj2xskdif~4 | 71164.2 11107.05 6.41 0.000 49388.61 92939.78
tci2 | -170123.4 6870.852 -24.76 0.000 -183593.8 -156652.9
dist | 28.42898 2.64817 10.74 0.000 23.23719 33.62077
argentina | 401804.7 42340.53 9.49 0.000 318795.3 484814.1
….
sumlat2 | 73606.66 2152.339 34.20 0.000 69386.96 77826.36
latdiffsq2 | 71.83413 17.64345 4.07 0.000 37.2438 106.4245
instdiffsq | 2.30e-08 1.14e-09 20.09 0.000 2.08e-08 2.52e-08
instdiffxs~4 | .003527 .0080782 0.44 0.662 -.0123104 .0193645
_cons | -4154066 232882.3 -17.84 0.000 -4610636 -3697496
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Partial R-squared of excluded instruments: 0.3986
Test of excluded instruments:
F( 4, 4274) = 498.38
Prob > F = 0.0000
First-stage regression of gdpdiffsq:
OLS estimation
--------------
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity
Number of obs = 4366
F( 91, 4274) = 25105.65
Prob > F = 0.0000
Total (centered) SS = 3.90010e+30 Centered R2 = 0.9987
Total (uncentered) SS = 4.41189e+30 Uncentered R2 = 0.9989
Residual SS = 5.06592e+27 Root MSE = 1.1e+12
------------------------------------------------------------------------------
| Robust
gdpdiffsq | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
skdiff4 | -1.13e+11 2.27e+11 -0.50 0.618 -5.58e+11 3.32e+11
invcj | 1.92e+11 8.28e+10 2.32 0.020 3.01e+10 3.55e+11
tcj2 | -3.89e+10 4.74e+10 -0.82 0.411 -1.32e+11 5.39e+10
tcj2xskdif~4 | 1.72e+11 6.46e+10 2.66 0.008 4.54e+10 2.99e+11
tci2 | 8.78e+10 2.97e+10 2.95 0.003 2.95e+10 1.46e+11
dist | 3.34e+07 1.75e+07 1.90 0.057 -983252.2 6.78e+07
…..
sumlat2 | 4.69e+09 1.12e+10 0.42 0.675 -1.73e+10 2.66e+10
latdiffsq2 | -6.41e+08 1.46e+08 -4.38 0.000 -9.27e+08 -3.54e+08
instdiffsq | .9843647 .0128443 76.64 0.000 .9591831 1.009546
instdiffxs~4 | -183051.9 109839.8 -1.67 0.096 -398395 32291.22
_cons | -5.67e+11 1.22e+12 -0.46 0.643 -2.97e+12 1.83e+12
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Partial R-squared of excluded instruments: 0.9246
Test of excluded instruments:
F( 4, 4274) = 1629.39
Prob > F = 0.0000
First-stage regression of gdpdiffxskdiff4:
OLS estimation
--------------
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity
Number of obs = 4366
F( 91, 4274) = 8252.74
Prob > F = 0.0000
Total (centered) SS = 2.08810e+15 Centered R2 = 0.9985
Total (uncentered) SS = 2.12589e+15 Uncentered R2 = 0.9986
Residual SS = 3.05358e+12 Root MSE = 26729
------------------------------------------------------------------------------
| Robust
gdpdiffxsk~4 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
skdiff4 | 11840.13 6163.584 1.92 0.055 -243.6901 23923.96
invcj | -5915.838 1844.225 -3.21 0.001 -9531.476 -2300.201
tcj2 | 7376.204 1419.659 5.20 0.000 4592.936 10159.47
tcj2xskdif~4 | -23616.51 2753.272 -8.58 0.000 -29014.35 -18218.66
tci2 | 2605.138 796.9941 3.27 0.001 1042.616 4167.66
dist | .1873208 .4076901 0.46 0.646 -.6119634 .986605
….
sumlat2 | 78.7583 249.6652 0.32 0.752 -410.7151 568.2317
latdiffsq2 | -2.176758 3.766132 -0.58 0.563 -9.560333 5.206816
instdiffsq | -3.19e-10 2.61e-10 -1.22 0.222 -8.31e-10 1.93e-10
instdiffxs~4 | 1.151003 .002057 559.55 0.000 1.14697 1.155036
_cons | -8679.519 27732.8 -0.31 0.754 -63050.2 45691.16
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Partial R-squared of excluded instruments: 0.9982
Test of excluded instruments:
F( 3, 4274) = 1.1e+05
Prob > F = 0.0000
Summary results for first-stage regressions
-------------------------------------------
Variable | Shea Partial R2 | Partial R2 | F( 4, 4274) P-value
sumgdp | 0.2974 | 0.3986 | 498.38 0.0000
gdpdiffsq | 0.6958 | 0.9246 | 1629.39 0.0000
gdpdiffxskdi| 0.9874 | 0.9982 | 1.1e+05 0.0000
NB: first-stage F-stat heteroskedasticity-robust
Underidentification tests
Ho: matrix of reduced form coefficients has rank=K1-1 (underidentified)
Ha: matrix has rank=K1 (identified)
Kleibergen-Paap rk LM statistic Chi-sq(2)=0.00 P-val=1.0000
Kleibergen-Paap rk Wald statistic Chi-sq(2)=0.00 P-val=1.0000
Weak identification test
Ho: equation is weakly identified
Kleibergen-Paap Wald rk F statistic 0.00
See main output for Cragg-Donald weak id test critical values
Weak-instrument-robust inference
Tests of joint significance of endogenous regressors B1 in main equation
Ho: B1=0 and overidentifying restrictions are valid
Anderson-Rubin Wald test F(3,4274)=59.02 P-val=0.0000
Anderson-Rubin Wald test Chi-sq(3)=180.88 P-val=0.0000
Stock-Wright LM S statistic Chi-sq(3)=138.06 P-val=0.0000
NB: Underidentification, weak identification and weak-identification-robust
test statistics heteroskedasticity-robust
Number of observations N = 4366
Number of regressors K = 91
Number of instruments L = 92
Number of excluded instruments L1 = 3
IV (2SLS) estimation
--------------------
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity
Number of obs = 4366
F( 90, 4275) = 14.89
Prob > F = 0.0000
Total (centered) SS = 2.40966e+12 Centered R2 = 0.5528
Total (uncentered) SS = 2.64194e+12 Uncentered R2 = 0.5921
Residual SS = 1.07757e+12 Root MSE = 15710
------------------------------------------------------------------------------
| Robust
fas3 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
sumgdp | .0036165 .0025838 1.40 0.162 -.0014477 .0086807
gdpdiffsq | -1.73e-09 2.07e-10 -8.37 0.000 -2.14e-09 -1.33e-09
gdpdiffxsk~4 | -.0112809 .000798 -14.14 0.000 -.012845 -.0097168
skdiff4 | 4674.125 3803.89 1.23 0.219 -2781.363 12129.61
invcj | -2049.643 1190.903 -1.72 0.085 -4383.769 284.4835
tcj2 | -1977.099 593.6901 -3.33 0.001 -3140.71 -813.488
tcj2xskdif~4 | -8263.003 978.4968 -8.44 0.000 -10180.82 -6345.184
tci2 | -3647.576 639.0047 -5.71 0.000 -4900.002 -2395.15
dist | -1.728639 .2666684 -6.48 0.000 -2.2513 -1.205979
…..
_cons | 42783.49 9530.638 4.49 0.000 24103.78 61463.2
------------------------------------------------------------------------------
Underidentification test (Kleibergen-Paap rk LM statistic): 0.000
Chi-sq(2) P-val = 1.0000
------------------------------------------------------------------------------
Weak identification test (Kleibergen-Paap rk Wald F statistic): 0.000
Stock-Yogo weak ID test critical values: <not available>
------------------------------------------------------------------------------
Hansen J statistic (overidentification test of all instruments): 0.269
Chi-sq(1) P-val = 0.6039
-endog- option:
Endogeneity test of endogenous regressors: 125.171
Chi-sq(3) P-val = 0.0000
Regressors tested: sumgdp gdpdiffsq gdpdiffxskdiff4
------------------------------------------------------------------------------
------------------------------------------------------------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/