On 2003-09-30 Fabrizio Gilardi ([email protected]) wrote:
> I've run a heckman selection model with the two step procedure and
> asked Stata to compute predicted probabilities of selection:
> heckman y1 x1, select(x2) twostep
> predict pse1
> Then I've run a probit with the same independent variables as in the
> selection stage of the heckman. The dependent variable is 1 if the
> outcome is observed in the heckman and 0 otherwise. Then I ask Stata
> to compute the predicted probabilities of a positive outcome:
> probit y2 x2
> predict p
> The problem is that when I compare pse1 and p, it turns out that they
> are quite different. p>pse1 for 80 percent of the observations. I
> don't understand why.
The problem is that Fabrizio forgot to specify the correct option
with -predict-. By default, -predict- after -heckman- calculates
the linear prediction of the regression equation. Thus, his "pse1"
variable contains the linear predictions. To have the variable "pse1"
contain the probability of being selected, Fabrizio would need to type
. predict pse1, psel
(Note that the last character of the option is a lowercase el, not
the number one.)
Here is a quick example to verify that:
. webuse womenwk, clear
. heckman wage educ age, select(married children educ age) twostep
. predict pse1, psel
. gen wagehere = (wage < .)
. probit wagehere married children educ age
. predict phat, p
. summ phat pse1
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
phat | 2000 .6712325 .2194172 .1171614 .9991774
pse1 | 2000 .6712325 .2194172 .1171614 .9991774
-- Brian
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/