Antoine Terracol <[email protected]> followed-up with some
timing comparisons between -ml- with the -d2- evaluator I posted and
-optimize()- with the -v2- evaluator:
> Playing with your code, I noticed that -optimize- is indeed faster than
> -ml- for small sample sizes, but that the reverse seems to be true for
> larger N
>
> I ran a few simulations (-simulate-, 100 replications) with varying N.
>
> Below is the results using -v2- and -d2- (the same pattern can be found
> with v1/d1 and v0/d0):
>
> N=100
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> optimize | 100 .00423 .0069947 0 .016
> ml | 100 .05278 .0082298 .046 .078
>
> N=1000
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> optimize | 100 .03333 .0061727 .015 .062
> ml | 100 .07604 .0079085 .062 .094
>
> N=5000
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> optimize | 100 .16598 .015475 .14 .218
> ml | 100 .18982 .014276 .171 .25
>
> N=10000
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> optimize | 100 .33812 .0282104 .296 .422
> ml | 100 .33324 .0244937 .296 .406
>
> N=50000
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> optimize | 100 2.00641 .2400445 1.625 2.875
> ml | 100 1.73035 .2316291 1.406 2.719
I forgot to mention something in my previous email, which Bill touched upon in
his reply. The type -v- evaluators of -optimize()- (v0, v1, and v2) are not
as comparable with the type -d- evaluators of -ml- because they carry extra
overhead. This overhead is principally due to the parameter-level score
matrix. You can image that as the dataset gets longer and wider that this
score matrix grows too, and each iteration requires a new score matrix to be
produced by the evaluator.
I originally kept with the type -v- evaluator for -optimize()- because I
assumed that Antoine might want to use the -bhhh- technique or produce robust
variance estimates (neither of which are possible with the type -d- evaluators
in -optimize()-). -ml- can employ the -bhhh- technique and produce robust
variance estimates with -d1- and -d2- evaluators that return the equation
level scores, because it uses tools that employ the chain rule for the linear
predictions of each equation. As Bill said, -optimize()- does not have a
concept of equations and thus cannot employ the chain rule for us.
I have a type -d- evaluator that Antoine can use with -optimize()- to get a
better timing comparison between -ml- and -optimize()- (when you do not need
the -bhhh- technique or a robust VCE); see the Stata code at the end of this
email. This probit likelihood evaluator employs the chain rule to produce a
gradient similar to -ml-'s -mlvecsum- command.
Here are some results from a single timing on my machine:
For d2:
. timer list
1: 1.48 / 1 = 1.4780
2: 1.70 / 1 = 1.7000
For d1:
. timer list
1: 4.32 / 1 = 4.3170
2: 5.07 / 1 = 5.0660
For d0:
. timer list
1: 6.08 / 1 = 6.0790
2: 6.82 / 1 = 6.8190
--Jeff
[email protected]
***** BEGIN:
clear all
local evaltype d1
set mem 20m
set obs 50000
drawnorm x1 x2 eps
gen cons=1
gen y=(1+x1+x2+eps>0)
timer clear 1
timer clear 2
mata:
void lnprobit_d2(
real scalar todo,
real rowvector beta,
real colvector y,
real matrix X,
real scalar lnf,
real rowvector g,
real matrix H
)
{
real colvector pm
real colvector xb
real colvector lj
real colvector dllj
real colvector d2llj
pm = 2*(y :!= 0) :- 1
xb = X*beta'
lj = normal(pm:*xb)
if (any(lj :== 0)) {
lnf = .
return
}
lnf = quadcolsum(ln(lj))
if (todo == 0 | missing(lnf)) return
dllj = pm :* normalden(xb) :/ lj
if (missing(dllj)) {
lnf = .
return
}
g = cross(dllj, X)
if (todo == 1) return
d2llj = dllj :* (dllj + xb)
if (missing(d2llj)) {
lnf = .
return
}
H = - cross(X, d2llj, X)
}
timer_on(1)
y=x=.
st_view(y, ., "y")
st_view(x, ., ("x1","x2","cons"))
S = optimize_init()
optimize_init_evaluator(S, &lnprobit_d2())
optimize_init_evaluatortype(S, "`evaltype'")
optimize_init_params(S, (0.1,0.5,0))
optimize_init_argument(S, 1, y)
optimize_init_argument(S, 2, x)
betahat = optimize(S)
sigmahat=sqrt(diagonal(optimize_result_V_oim(S)))'
tstat=betahat:/sigmahat
pval=2:*(1:-normal(abs(tstat)))
(betahat\sigmahat\tstat\pval)'
timer_off(1)
end
program myprobit_d2
version 10
args todo b lnf g negH g1
tempvar xb lj
mleval `xb' = `b'
quietly {
gen double `lj' = normal( `xb') if $ML_y1 == 1
replace `lj' = normal(-`xb') if $ML_y1 == 0
mlsum `lnf' = ln(`lj')
if (`todo'==0 | `lnf' >= .) exit
replace `g1' = normalden(`xb')/`lj' if $ML_y1 == 1
replace `g1' = -normalden(`xb')/`lj' if $ML_y1 == 0
mlvecsum `lnf' `g' = `g1', eq(1)
if (`todo'==1 | `lnf' >= .) exit
mlmatsum `lnf' `negH' = `g1'*(`g1'+`xb'), eq(1,1)
}
end
ml model `evaltype' myprobit_d2 (y=x1 x2)
mat I= (0.1 , 0.5,0)
ml init I, copy
timer on 2
ml max, search(off)
timer off 2
timer list
***** END:
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/