This isn't a Stata suggestion, but a math/numerical analysis one:
One big problem I have encountered with heteroscedastic models is the fact that you need *much* better starting values than for homoscedastic models. Recall that numerical optimiztion by Newton's method and its variants is locally convergent. A model like the MVN with unconstrained covariance matrix or other exponential family is "nice" because the log-likelihood is globally concave, near---or exactly---quadratic, you get separability between location and scale, etc. Add heteroscedasticity predictors and this is no longer true generally. If you don't start close enough to the solution you may end up in a region of the likelihood that is a local optimum, very ill-behaved, or even singular.
Also, you may well have a situation of empirical or even true unidentification.
-----Original Message-----
From: "Carlo Fezzi" <[email protected]>
To: [email protected]
Sent: 2/23/2009 9:32 AM
Subject: st: heteroskedastic multivariate normal regression with ml, convergence problem
Dear all,
I am writing my code to estimate a multivariate normal regression with
heteroskedastic errors via maximum likelihood, using ?ml? with the method
?lf?. I have a question specific on my model, but I guess the answer could
be useful to a lot of beginner maximum likelihood programmers like myself.
My multivariate normal code works really well when I estimate the model with
homoskedastic error. However, if I add a simple parametric specification for
the variance, such as log(si) = b0 + b1x1i, the maximization problem seems
much more difficult and I get unexpected results (even if I generate the
data to be homoskedastic, and so I would expect simply the parameters in the
variance equation to be non-significant). For instance, if you try to run
the example below, one of the correlation parameters will end up to have
standard error and confidence interval to be missing values. Also, the
parameters of x1 will result in being significant in the variance equations,
even if the data are generated to be homoskedastic, so I would expect them
to be non-significant.
Is there anything I can do to improve the convergence, both in the structure
of the code or in the maximization technique?
Thanks a lot for your help,
Carlo
This is my code:
----------- begin example --------------
set seed 1
clear
set obs 1000
global n_eq = 3
matrix r = (1, 0.35, 0.5 \ 0.35, 1, 0.2 \ ///
0.5, 0.2, 1)
drawnorm u1 u2 u3, corr(r)
gen x1 = uniform() -0.5
gen x2 = 2*uniform() + 0.5
gen y1 = 0.5 + 4*x1 + u1
gen y2 = -1 -2*x1 + 3*x2 + u2
gen y3 = 3 + 3*x1 - 2*x2 + u3
capture program drop multireg_lf
program multireg_lf
version 10.0
*** inizialize beta vectors ***
local mu_k "mu1"
forvalues i = 2/$n_eq {
local mu_k "`mu_k' mu`i'"
}
*** inizialize sigmas ***
local lnsigma_k "lnsigma1"
forvalues i = 2/$n_eq {
local lnsigma_k "`lnsigma_k' lnsigma`i'"
}
*** inizialize correlations ***
local atr_kk ""
forvalues i = 1/$n_eq {
forvalues j = `=`i'+1'/$n_eq {
local atr_kk "`atr_kk' ar`i'`j'"
}
}
args lnf `mu_k' `lnsigma_k' `atr_kk'
** display "`mu_k'" "`lnsigma_k'" "`atr_kk'"
*** compute residuals ***
forvalues i=1/$n_eq {
tempvar e`i'
gen double `e`i'' = (${ML_y`i'} - `mu`i'')
}
tempname S invS ip is
*** compute covariance matrix ***
matrix `S' = J($n_eq,$n_eq,0)
** variances **
forvalues i = 1/$n_eq {
summ `lnsigma`i'', meanonly
matrix `S'[`i',`i']=exp(r(mean))^2
}
** co-variances **
forvalues i = 1/$n_eq {
forvalues j = `=`i'+1'/$n_eq {
summ `ar`i'`j'', meanonly
matrix `S'[`i',`j'] =
sqrt(`S'[`i',`i'])*sqrt(`S'[`j',`j'])*tanh(r(mean))
matrix `S'[`j',`i'] = `S'[`i',`j']
}
}
** substitute the covariance matrix with the one at the previous step
** if not positive semidefinite
capture matrix CC = cholesky(`S')
if _rc != 0 {
matrix `S' = CC*CC'
}
*** compute the likelihood ***
** define a names vector to do correctly the score operation **
local nam=""
forvalues i = 1/$n_eq {
local nam `nam' `e`i''
}
** compute the (x-mu)InvS(x-mu)' part of the likelihood **
mat `invS' = invsym(`S')
gen double `ip'=0
forvalues i = 1/$n_eq {
tempvar g`i'
matrix `is' = `invS'[`i',1...]
matrix colnames `is' =`nam'
quietly matrix score `g`i''=`is'
quietly replace `ip'= `ip' + `e`i''*`g`i''
}
quietly replace `lnf' =
-$n_eq/2*log(2*_pi)-1/2*log(det(`S'))-0.5*`ip'
end
ml model lf multireg_lf (mu1: y1 = x1 x2) (mu2: y2 = x1 x2) (mu3: y3 = x1
x2) ///
(lnsigma1: x1) (lnsigma2: x1) (lnsigma3:) ///
(ar12:) (ar13:) (ar23:), technique(bhhh dfp)
quietly regress y1 x1 x2
mat b1 = e(b)
mat coleq b1 = mu1
quietly regress y2 x1 x2
mat b2 = e(b)
mat coleq b2 = mu2
quietly regress y3 x1 x2
mat b3 = e(b)
mat coleq b3 = mu3
mat b0 = b1,b2,b3
ml init b0
ml maximize, difficult
----------- end example --------------
**************************************************
Carlo Fezzi
Senior Research Associate
Centre for Social Research on the Global Environment (CSERGE)
Department of Environmental Sciences
University of East Anglia
Norwich (UK) NR2 7TJ
Telephone: +44(0)1603 591385
Fax: +44(0)1603 593739
*************************************************
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/