Hi everybody. I want to estimate the following simultaneous Tobit model :
H_star = alpha* y + X*beta + u
Y_star = Z*delta + e
Where H_star and Y_star are latent variables, both observed (i.e. as H and Y) when greater than zero.
The difficulty with such a model is obviously that the (endogenous) regressor "y" is a latent variable.
We denote "s_u", "s_e" and "s_eu" as the elements of the covariance matrix between error terms.
There is indeed for regime in the model (1: H=0 and y=0) (2: H>0 and Y=0) (3: H=0 and Y>0) (4: H>0 and Y>0).
Note that we have the following reduced form for H:
H= X*beta + e if H_star>0 and y=0
alpha*Z*delta + X*beta + u_et if H_star>0 and y>0
0 otherwise
Note that the coefficient associated with "y" in the structural equation (alpha) enters the reduced form associated with "Z*delta" in the reduced form equation.
When writing the likelihood function, the term (alpha*Z*delta) appears in regime 3 and regime 4 and "alpha" enters the expression for the covariance between the error terms in the reduced form equation ("e" and "u_et"). The problem I have is to tell stata how to "capture" the coefficient (and only the coefficient) "alpha" in the first structural equation. That is obviously needed in order to be able to write the likelihood function.
Here is my program (note that I test the program by fixing alpha to an arbitrary constant value and it seemed ok) :
capture program drop maxlik
program define maxlik
args lnf beta delta sig_u sig_e sig_ue
quietly capture dropvars xbeta zdelta
gen double xbeta=`beta'
gen double zdelta=`delta'
scalar s_u = exp(`sig_u')
scalar s_e = exp(`sig_e')
scalar s_ue = exp(`sig_ue')
scalar rho = tanh(s_ue/(s_u*s_e))
scalar alpha = xbeta[1,1]
scalar s_uet = (alpha^2)*s_u^2+s_e^2+2*alpha*s_ue
scalar rho_et = (alpha*s_u^2+s_ue)/(sqrt(s_u^2*(alpha^2*s_u^2+s_e^2+2*alpha*s_ue)))
scalar gamma = alpha*zdelta
quietly{
capture dropvars regime_1 regime_2 regime_3 regime_4
gen double regime_1 = 0.0
replace regime_1 = binorm(-xbeta/s_e,-zdelta/s_u,rho) if $ML_y1==0 & $ML_y2==0
gen double regime_2 = 0.0
replace regime_2 = norm((-zdelta/s_u-rho*($ML_y1-xbeta)/s_e)/sqrt(1-rho^2))*1/s_e*normden(($ML_y1-xbeta)/s_e) if $ML_y1>0 & $ML_y2==0
gen double regime_3 = 0.0
replace regime_3 = norm((-gamma-xbeta-rho_et*(s_uet/s_u)*($ML_y2-zdelta))/s_uet*sqrt(1-rho_et^2))*1/s_u*normden(($ML_y2-zdelta)/s_u) if $ML_y1==0 & $ML_y2>0
gen double regime_4 = 0.0
replace regime_4 = (1/(2*_pi*s_u*s_uet*sqrt(1-rho_et^2)))*exp((-1/2*(1-rho_et^2))*( (($ML_y2-zdelta)/s_u)^2-2*rho_et*(($ML_y2-zdelta)/s_u)*(($ML_y1-gamma-xbeta)/s_uet)+(($ML_y1-gamma-xbeta)/s_uet)^2)) if $ML_y1>0 & $ML_y2>0
}
quietly replace `lnf' = ln(regime_1) if $ML_y1==0 & $ML_y2==0
quietly replace `lnf' = ln(regime_2) if $ML_y1>0 & $ML_y2==0
quietly replace `lnf' = ln(regime_3) if $ML_y1==0 & $ML_y2>0
quietly replace `lnf' = ln(regime_4) if $ML_y1>0 & $ML_y2>0
end
use scf86_roc.dta, clear
ml model lf maxlik (beta: heures=revimp tn ipe ne nb man sas alb cb age_16_24 age_25_34 age_35_44 age_55_69 homme celibataire marie chef_fam educ_0_8 educ_postsec educ_univ francais etranger famille en_0_24 etudiant) (delta: revimp=tn ipe ne nb man sas alb cb age_16_24 age_25_34 age_35_44 age_55_69 homme celibataire marie chef_fam educ_0_8 educ_postsec educ_univ francais etranger famille en_0_24 etudiant) /sig_u /sig_e /sig_ue
ml maximize
I know all this is a bit confused. Feel free to ask for more precisions.
Olivier
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/