David,
Let me answer the second question first. To my knowledge, the IMR are a
function of the predicted probabilities of the various outcomes in your
first stage mlogit regression.
For your four outcomes, you first have to create variables for your
predicted probabilities of each outcome. Right after running the mlogit,
you can type:
predict p0 if e(sample), outcome(0);
predict p1 if e(sample), outcome(1);
predict p2 if e(sample), outcome(2);
predict p3 if e(sample), outcome(3);
You then have to use p0,p1,p2,p3 to create the 3 mills ratio terms.
According to formulas given by Dubin and McFadden (Econometrica circa
1984), the following code would create the mills terms: (you should
check whether the formulas below are appropriate for your particular
problem)
gen trnsp0=(p0*ln(p0))/(1-p0);
gen trnsp1=(p1*ln(p1))/(1-p1);
gen trnsp2=(p2*ln(p2))/(1-p2);
gen trnsp3=(p3*ln(p3))/(1-p3);
gen millsp1=3*ln(p1)+ trnsp0 +trnsp2 +trnsp3;
gen millsp2=3*ln(p2)+ trnsp0 +trnsp1 +trnsp3;
gen millsp3=3*ln(p3)+ trnsp0 +trnsp1 +trnsp2;
You can plug in millsp1-millsp3 in your second stage logit. If you are
interested in the standard errors for the mills ratio terms in the
second stage logit, then more work has to be done - you should probably
bootstrap errors.
As to your first question, in my opinion, this is a fine thing to do, as
long as you have a variable that helps identify the covariance in the
first and second stage error terms. If you're using the exact same set
of variables in your first and second stages, then only the
non-linearity of the Mills ratio terms is used for identification, and
according to the literature on selection correction, this is not a good
way to proceed. There's a paper in the Journal of Economic Surveys on
the Heckman selection correction that discusses these issues.
-Mushfiq
A. Mushfiq Mobarak
Assistant Professor of Economics
University of Colorado at Boulder
303-492-8872
Date: Mon, 07 Apr 2003 08:52:28 -0600
From: David Leblang <[email protected]>
Subject: st: Inverse Mills Ratio after MLOGIT
Listers,
I am trying to estimate a selection type model in the tradition of the
heckprob command however where the first stage has multiple outcomes
(four) and the second stage is a standard logit/probit. My approach to
this is to estimate the first stage as a multinomial logit, get the
predicted probabilities, and plug them into the second stage logit.
However, because I assume that the errors from the first and second
stage models are correlated, I want to generate the inverse mills ratio
(IMR) from the first stage multinomial logit and add those in the second
stage equation (this is discussed in Millimet's faq on endogeniety).
Here are my questions:
1. from a statistical point of view, does this make sense?
2. how can I obtain the IMR after the mlogit? I have searched the
faqs, etc but cannot find an answer.
Thanks,
David Leblang
University of Colorado
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/