Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Erkan Duman <erduman@sabanciuniv.edu> |
To | statalist@hsphsun2.harvard.edu |
Subject | st: rare event problem in first stage of IV2SLS |
Date | Wed, 22 Jan 2014 18:15:04 +0200 |
I have a binary choice model with binary endogenous variable. I am investigating the impacts of migration experience on the school attendance of migrant household's children. I decided on to use IV2SLS and bivariate probit methods where the instrument is historical migration networks at state level which is supposed to provide a reason why one household engages in migration and another similar household does not. I could not use bivariate probit because in any of my specifications the bivariate normality of errors assumption is violated. IV2SLS does not need such an assumption; however, the predicted school attendance rates are out of [-1,1] range and the estimated migration coefficient is also out of range- around 4. I controlled for multicollinearity and try to control as many variables as possible which may threaten the instrument's exogeneity. None worked for me, still the estimated migration coefficient is around 3. Below you can find the two stages: School attendance(i)= a+b*migration_hat(i)+error(i) Second stage migration(i)= c+d*historical migration rate(ij)+error2(i) First stage Chiburis et al. 2011 argues that when the treatment probability (in our case the share of remittance receiving households) is low where low is below 0.1, then linear IV estimation becomes very uninformartive. When I searched for that problem, I come up with King and Zeng 2001 which provides a way to correct for rare event. King and Zeng 2001 deals with a logit regresion where the dependent variable is a rare event. In my case the first stage is a logit regression where the dependent variable is a rare event and I believe that this rare event problem in the first stage causes problems in estimating coefficients out of the [-1, 1] range. In my case the share of remittance receiving househols is 1.55% which suits the rare event definition of King and Zeng 2001. That is, 1529 remittance receiving househols and 97038 non-receiving househols (1529 1s and 97038 0s). I thought to use King and Zeng 2001 correction method (relogit in stata) in the first stage regression and plug the predicted values from the first stage into the second stage; however, in this case, the standard errors from the first stage needs to be corrected. Plus, I am not sure whether this way of handling the problem is correct, also do not know how to correct for the standard errors. I could not find any relevant material which deals with rare events in an instrumental variable estimation environment. Can you please help me solving the rare event problem in the first stage of an instrumental variable estimation strategy? -- Erkan Duman Graduate student - PhD Faculty of Art and Social Sciences Sabancı University * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/