Hello,
I am writing an ml program that replicates �oprobit� code in stata. With
all many trials, I have not succeeded in getting the exactly identical
estimates as with "oprobit" command. Below is the basic program I used to
compare with oprobit estimates.
**************************************
capture program drop myoprobit
program define myoprobit
args lnf xb t1 t2 t3
tempvar p1 p2 p3 p4
qui gen double `p1'=ln(norm(`t1'-`xb'))
qui gen double `p2'=ln(norm(`t2'-`xb')-norm(`t1'-`xb'))
qui gen double `p3'=ln(norm(`t3'-`xb')-norm(`t2'-`xb'))
qui gen double `p4'=ln(norm(`-t3'+`xb'))
qui replace `lnf'=($ML_y1==1)*`p1' + ($ML_y1==2)*`p2' /*
*/ +($ML_y1==3)*`p3' + ($ML_y1==4)*`p4'
end
clear
sysuse auto
replace rep=2 if rep==. | rep==1
replace rep=rep-1
xi: ml model lf myoprobit (rep =mpg i.turn, nocons)(tau1: ) (tau2:)
(tau3:),
ml maximize
*******************************************
The two results look similar at the first glance, but if you take a closer
look, _Iturn_32 and _Iturn_46 are different. The differences may be
negligible with auto.dta, but they are amplified with my dataset, again on
dummies.
To make sure, I made variations on the base model above in the following
ways:
1) Using alternative distribution function: norm vs. normprob
2) Define p4, the last probability, differently:
`p4'=ln(1-norm(`t3'-`xb')) vs. `p4'=ln(norm(-`t3'+`xb'))
3) with or without equation name:
(rep =mpg i.turn, nocons) vs. (auto: rep =mpg i.turn, nocons)
4) with default ml tolerance and Itolerance vs. with the stata internal
values, that is, tolerance(1e-4) and ltolerance(0)
By combining these four alternatives, I got 16 variations in ml ordered
probit programs. Among the coefficients, _Iturn_32 and _Iturn_46 are still
different across the 16 variations and not to mention that none of 16
models produces the same estimates as with "Oprobit" command. To my
knowledge, the modifications (1), (2) and (3) should not make any
difference in estimates.
My questions are
1) How to replicate oprobit results including the values for dummies.
2) Why do the three variations above result in different estimates for
some coefficients? As far as I understand,
- normprob calculates the same cdf for normal distribution.
- How to define the last probability: ln(1-norm(`t3'-`xb')) = ln(norm(-
`t3'+`xb')). I have read that recommends ln(norm(-`t3'+`xb')) over ln(1-
norm(`t3'-`xb')) in the following stata list.
http://www.stata.com/statalist/archive/2003-05/msg00076.html
- Including an equation name should not affect the estimation.
3) With my dataset, playing around with tolerance/itolerance levels made a
big difference in some dummy estimates. Then, what is the best way to fix
up these tolerance levels?
I'd be very grateful for any suggestions.
Sunhwa
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/