Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: -heckman- with model reparameterization


From   "FEIVESON, ALAN H. (AL) (JSC-SD) (NASA)" <[email protected]>
To   "'statalist'" <[email protected]>
Subject   st: -heckman- with model reparameterization
Date   Fri, 24 Jan 2003 10:41:56 -0600

Hi - I am attempting to run a heckman selection regression model with three
indicator variables as the only independent variables. In particular they
are ipre, iin and ipost, denoting membership in one of three "phases".

First, I run -heckman- with a constant and ipre omitted (because
ipre+iin+ipost=1 for all observations):
. heckman y  iin ipost,select(iin ipost)  nolog

Heckman selection model                         Number of obs      =
225
(regression model with sample selection)        Censored obs       =
176
                                                Uncensored obs     =
49

                                                Wald chi2(2)       =
13.59
Log likelihood =  -178.399                      Prob > chi2        =
0.0011

----------------------------------------------------------------------------
--
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+--------------------------------------------------------------
--
y            |
         iin |  -2.043379   .5862918    -3.49   0.000     -3.19249
-.8942678
       ipost |  -1.043764   .7195747    -1.45   0.147    -2.454105
.3665763
       _cons |   3.244112   .5383337     6.03   0.000     2.188997
4.299226
-------------+--------------------------------------------------------------
--
select       |
         iin |   -.438019   .2717842    -1.61   0.107    -.9707064
.0946683
       ipost |   -.160166   .3411495    -0.47   0.639    -.8288066
.5084747
       _cons |  -.4327397   .2498872    -1.73   0.083    -.9225096
.0570302
-------------+--------------------------------------------------------------
--
     /athrho |   3.096948   .4599552     6.73   0.000     2.195453
3.998444
    /lnsigma |    .729144   .1330982     5.48   0.000     .4682763
.9900116
-------------+--------------------------------------------------------------
--
         rho |   .9959246   .0037414                      .9755242
.9993272
       sigma |   2.073305   .2759531                      1.597239
2.691266
      lambda |   2.064855   .2785777                      1.518853
2.610858
----------------------------------------------------------------------------
--
LR test of indep. eqns. (rho = 0):   chi2(1) =    -3.12   Prob > chi2 =
.
----------------------------------------------------------------------------
--

Note that the estimate of rho is close to 1 (rho-hat=0.9959).


Now I re-run this model with no constant for the "y"-equation and ipre
included:
. heckman y ipre iin ipost,select(iin ipost)  nolog noconst

Heckman selection model                         Number of obs      =
225
(regression model with sample selection)        Censored obs       =
176
                                                Uncensored obs     =
49

                                                Wald chi2(3)       =
69.47
Log likelihood = -176.8395                      Prob > chi2        =
0.0000

----------------------------------------------------------------------------
--
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+--------------------------------------------------------------
--
y            |
        ipre |    4.70526   .8244026     5.71   0.000     3.089461
6.32106
         iin |   4.351951    1.25856     3.46   0.001     1.885219
6.818684
       ipost |   4.946579   1.118222     4.42   0.000     2.754904
7.138253
-------------+--------------------------------------------------------------
--
select       |
         iin |  -.7462623   .2854159    -2.61   0.009    -1.305667
-.1868573
       ipost |  -.4851131   .3579489    -1.36   0.175     -1.18668
.2164538
       _cons |  -.1642108   .2626185    -0.63   0.532    -.6789335
.350512
-------------+--------------------------------------------------------------
--
     /athrho |  -5.16e-08   .9924865    -0.00   1.000    -1.945238
1.945238
    /lnsigma |  -.1444339   .1010153    -1.43   0.153    -.3424202
.0535523
-------------+--------------------------------------------------------------
--
         rho |  -5.16e-08   .9924865                     -.9599473
.9599473
       sigma |   .8655121   .0874299                      .7100498
1.055012
      lambda |  -4.46e-08   .8590091                     -1.683627
1.683627
----------------------------------------------------------------------------
--
LR test of indep. eqns. (rho = 0):   chi2(1) =    -0.00   Prob > chi2 =
.
----------------------------------------------------------------------------
--

The estimate of the "ipre" coefficient (4.70) should be the same as _cons in
the first model (3.24). Also, the estimate of the "iin" coefficient (4.35)
should be equal to _cons + iin in the first model (1.20). Fnally, the
estimate of the "ipost" coeficient (4.95) should be equal to _cons + ipost
in the first model (2.20). None of these are even close.

Note that the log likelihood is higher and the estimate of rho is now
essentially zero and therefore (I guess) the estimated coefficients for the
"y"-equation are the same as under OLS, ignoring the nonselected
observations:
. reg y ipre iin ipost,nocons

      Source |       SS       df       MS              Number of obs =
49
-------------+------------------------------           F(  3,    46) =
419.51
       Model |  1004.26774     3  334.755914           Prob > F      =
0.0000
    Residual |  36.7064468    46  .797966235           R-squared     =
0.9647
-------------+------------------------------           Adj R-squared =
0.9624
       Total |  1040.97419    49  21.2443712           Root MSE      =
.89329

----------------------------------------------------------------------------
--
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
Interval]
-------------+--------------------------------------------------------------
--
        ipre |    4.70526    .282483    16.66   0.000     4.136652
5.273869
         iin |   4.351951   .1604395    27.13   0.000     4.029003
4.674899
       ipost |   4.946579   .3158256    15.66   0.000     4.310855
5.582302
----------------------------------------------------------------------------
--

So apparently, with two different (and simple) parameterizations of the same
model, we can get completely different results. I suspect this has something
to do with how first guesses are calculated. Is this true and if so, what
can be done to improve the robustness of this procedure? How can we believe
anything we get from it?

Al Feiveson
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index