Hi - I am attempting to run a heckman selection regression model with three
indicator variables as the only independent variables. In particular they
are ipre, iin and ipost, denoting membership in one of three "phases".
First, I run -heckman- with a constant and ipre omitted (because
ipre+iin+ipost=1 for all observations):
. heckman y iin ipost,select(iin ipost) nolog
Heckman selection model Number of obs =
225
(regression model with sample selection) Censored obs =
176
Uncensored obs =
49
Wald chi2(2) =
13.59
Log likelihood = -178.399 Prob > chi2 =
0.0011
----------------------------------------------------------------------------
--
| Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+--------------------------------------------------------------
--
y |
iin | -2.043379 .5862918 -3.49 0.000 -3.19249
-.8942678
ipost | -1.043764 .7195747 -1.45 0.147 -2.454105
.3665763
_cons | 3.244112 .5383337 6.03 0.000 2.188997
4.299226
-------------+--------------------------------------------------------------
--
select |
iin | -.438019 .2717842 -1.61 0.107 -.9707064
.0946683
ipost | -.160166 .3411495 -0.47 0.639 -.8288066
.5084747
_cons | -.4327397 .2498872 -1.73 0.083 -.9225096
.0570302
-------------+--------------------------------------------------------------
--
/athrho | 3.096948 .4599552 6.73 0.000 2.195453
3.998444
/lnsigma | .729144 .1330982 5.48 0.000 .4682763
.9900116
-------------+--------------------------------------------------------------
--
rho | .9959246 .0037414 .9755242
.9993272
sigma | 2.073305 .2759531 1.597239
2.691266
lambda | 2.064855 .2785777 1.518853
2.610858
----------------------------------------------------------------------------
--
LR test of indep. eqns. (rho = 0): chi2(1) = -3.12 Prob > chi2 =
.
----------------------------------------------------------------------------
--
Note that the estimate of rho is close to 1 (rho-hat=0.9959).
Now I re-run this model with no constant for the "y"-equation and ipre
included:
. heckman y ipre iin ipost,select(iin ipost) nolog noconst
Heckman selection model Number of obs =
225
(regression model with sample selection) Censored obs =
176
Uncensored obs =
49
Wald chi2(3) =
69.47
Log likelihood = -176.8395 Prob > chi2 =
0.0000
----------------------------------------------------------------------------
--
| Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+--------------------------------------------------------------
--
y |
ipre | 4.70526 .8244026 5.71 0.000 3.089461
6.32106
iin | 4.351951 1.25856 3.46 0.001 1.885219
6.818684
ipost | 4.946579 1.118222 4.42 0.000 2.754904
7.138253
-------------+--------------------------------------------------------------
--
select |
iin | -.7462623 .2854159 -2.61 0.009 -1.305667
-.1868573
ipost | -.4851131 .3579489 -1.36 0.175 -1.18668
.2164538
_cons | -.1642108 .2626185 -0.63 0.532 -.6789335
.350512
-------------+--------------------------------------------------------------
--
/athrho | -5.16e-08 .9924865 -0.00 1.000 -1.945238
1.945238
/lnsigma | -.1444339 .1010153 -1.43 0.153 -.3424202
.0535523
-------------+--------------------------------------------------------------
--
rho | -5.16e-08 .9924865 -.9599473
.9599473
sigma | .8655121 .0874299 .7100498
1.055012
lambda | -4.46e-08 .8590091 -1.683627
1.683627
----------------------------------------------------------------------------
--
LR test of indep. eqns. (rho = 0): chi2(1) = -0.00 Prob > chi2 =
.
----------------------------------------------------------------------------
--
The estimate of the "ipre" coefficient (4.70) should be the same as _cons in
the first model (3.24). Also, the estimate of the "iin" coefficient (4.35)
should be equal to _cons + iin in the first model (1.20). Fnally, the
estimate of the "ipost" coeficient (4.95) should be equal to _cons + ipost
in the first model (2.20). None of these are even close.
Note that the log likelihood is higher and the estimate of rho is now
essentially zero and therefore (I guess) the estimated coefficients for the
"y"-equation are the same as under OLS, ignoring the nonselected
observations:
. reg y ipre iin ipost,nocons
Source | SS df MS Number of obs =
49
-------------+------------------------------ F( 3, 46) =
419.51
Model | 1004.26774 3 334.755914 Prob > F =
0.0000
Residual | 36.7064468 46 .797966235 R-squared =
0.9647
-------------+------------------------------ Adj R-squared =
0.9624
Total | 1040.97419 49 21.2443712 Root MSE =
.89329
----------------------------------------------------------------------------
--
y | Coef. Std. Err. t P>|t| [95% Conf.
Interval]
-------------+--------------------------------------------------------------
--
ipre | 4.70526 .282483 16.66 0.000 4.136652
5.273869
iin | 4.351951 .1604395 27.13 0.000 4.029003
4.674899
ipost | 4.946579 .3158256 15.66 0.000 4.310855
5.582302
----------------------------------------------------------------------------
--
So apparently, with two different (and simple) parameterizations of the same
model, we can get completely different results. I suspect this has something
to do with how first guesses are calculated. Is this true and if so, what
can be done to improve the robustness of this procedure? How can we believe
anything we get from it?
Al Feiveson
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/