Dear statalisters,
I am new to Stata and I am not sure if the estimation procedure below
makes sense or if there are any pitfalls I may have ignored. Any help
would be greatly appreciated.
My overall goal is to estimate the propensity to cooperate (yes/no)
subject to a few independent variables.
1) As first part of my estimation, I want to correct for selection bias,
since only the innovators had to answer if they cooperate or not. I
tried two ways,
one is the procedure suggested according to
http://www.stata.com/statalist/archive/2006-09/msg00772.html and the
second one is doing a heckprob, more precisely:
xi: heckprob coop xi.independent varlist, select (inno=xi.independent
varlist), first
In fact, I am not interested in the heckprob equation, but only in the
selection equation, since I want to determine the Inverse Mill' s Ratio.
After heckprob, I calculate the linear predictors of the selection equation
predict psel, xbsel
generate IMR= normalden(psel)/normal(pesel)
I use the calculated values of IMR as additional regressor for the
substantial equation in 2).
I have two questions: Is there a difference if I use the heckprob or the
procedure described in the link above (which was for a logit)? I
understand that one is following a probit and the other one a logit
distribution, but apart from this? As far as I can see from the results
of the second equation, there seems to be no difference.
Does it cause any problems if I only have categorial variables as
regressors?
2) The second part, the substantial equation, is supposed to account for
the selection bias of being an innovator or not AND the endogeneity of
the R&D intensity as an IV.
I consider R&D intensity as an endogneous variable, because theory says
that the propensity to cooperate and R&D intensity are interdependent.
Since cooperation is a dichotomous variable, I use
xi: ivprobit coop xi.independent varlist IMR (R&D_int=instruments)
The results of ivprobit show a significant Rho(-0.9), indicating that
R&D_int is in fact endogeneous (as far as I understand).
Unfortunately, not only the coefficient of the IMR turnes out to be
totally insignificant, but many regressors have become insignificant,
too. What does it mean and how can I solve this problem? I suspect that
the insignificance of the IMR coefficient may be caused by the fact that
the LR test of indep.equ. in the preceding heckprobit equation was
Prob>chi2=0.1461, but I am not sure if this tips the scales.
I know these are a lot of questions, but maybe someone could show me the
right path. Maybe there are further aspects which I forgot to consider.
Thanks a lot,
melanie
-
melanie baier
c-lab
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/