Thanks to all who have pointed this out, including Roberto G. Gutierrez
who was first, but off list.
You are not wrong about the speed: 8 hours in gllamm, 4 minutes in
xtpoisson!! (in MP4)
But it's disturbing how different the results can be. In this example
(suggested by RGG), the variance estimates don't agree to even one
figure on what I think are equivalent models, or aren't they?
webuse ships, clear
gen logserv=ln(service)
glo X op_75_79 co_65_69 co_70_74 co_75_79
xtset ship
xtpoisson accident $X, offset(logserv) normal // takes
0.14 seconds on my pc
gllamm accident $X, fam(poisson) offset(logserv) i(ship) // takes
5.36 seconds on my pc
~~~~~~~~~ xtpoisson results ~~~~~~~~~
------------------------------------------------------------------------------
accident | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------------
op_75_79 | .3830105 .118253 3.24 0.001 .1512389
.6147821
co_65_69 | .7093762 .149593 4.74 0.000 .4161794
1.002573
co_70_74 | .8576789 .1693625 5.06 0.000 .5257346
1.189623
co_75_79 | .4992132 .2317164 2.15 0.031 .0450574
.953369
_cons | -6.640989 .2067838 -32.12 0.000 -7.046278
-6.2357
logserv | (offset)
-------------+----------------------------------------------------------------
/lnsig2u | -2.352979 .8583287 -2.74 0.006 -4.035272
-.6706858
-------------+----------------------------------------------------------------
sigma_u | .3083593 .1323368 .1329694
.7150928
------------------------------------------------------------------------------
Likelihood-ratio test of sigma_u=0: chibar2(01) = 10.67 Pr>=chibar2 =
0.001
~~~~~~~~~ gllamm results ~~~~~~~~~
------------------------------------------------------------------------------
accident | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------------
op_75_79 | .3849786 .1182184 3.26 0.001 .1532747
.6166824
co_65_69 | .7058854 .1495483 4.72 0.000 .412776
.9989947
co_70_74 | .847284 .1692169 5.01 0.000 .5156249
1.178943
co_75_79 | .4940048 .2301141 2.15 0.032 .0429894
.9450201
_cons | -6.724426 .140161 -47.98 0.000 -6.999137
-6.449716
logserv | (offset)
------------------------------------------------------------------------------
Variances and covariances of random effects
-----------------------------------------------------------------------------
***level 2 (ship)
var(1): .17662891 (.09378635)
-----------------------------------------------------------------------------
My (quite likely wrong) understanding of these results is that
exp(-2.352979)= 0.095085 and .17662891 are estimates of the same
variance parameter, which is a bit worrying. I take it the value
(.09378635) is the SE of the variance estimate, and it's just a
coincidence that it happens to be close to the xtpoisson variance estimate.
Increasing the nip() parameter of gllamm from the default 8 to 19
changes the 0.1766.. value to 0.3529.., which suggests to me that the
xtpoisson result is perhaps more reliable (it also doubles the execution
time to 10.31 sec). Can someone more expert confirm and/or explain? We
know that precise is not the same as accurate, so perhaps invariant is
also not to the point.
Thanks
Keith
Jeph Herrin wrote:
If you have a single random effect, you may find -xtpoisson-
is even faster than -xtmepoisson-.
hth,
Jeph
Keith Dear (home) wrote:
Ummm ... no (well, NOW I have).
Except on the uni supercomputer, we only have Stata9, hence
ignorance. Time to upgrade!
Many thanks Martin.
Keith
ps
http://www.stata.com/help.cgi?xtmepoisson
http://stata.com/stata10/mixedmodels.html
Martin Weiss wrote:
<>
Have you looked into -xtmepoisson-?
HTH
Martin
-----Ursprüngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von Keith Dear
(work)
Gesendet: Mittwoch, 24. Juni 2009 08:01
An: [email protected]
Cc: Ainslie Butler
Betreff: st: gllamm (poisson) execution time
We are trying to model daily mortality by poisson regression, over
17 years, by postcode, with postcode as a single random intercept term.
In Stata10/MP4 on a linux cluster our models each take 7 or 8 hours
to fit, which is too long to be feasible for exploratory analyses.
The full dataset has >14 million rows of data: a row for every day
for 1991-2007 for every postcode in Australia (~2200 postcodes), but
to get things moving we are starting with smaller geographical
regions of only 100 or 200 postcodes. Thus N=17*365*(100 or 200),
about a half or one million. Also we are starting with failrly
simple models, p=17 fixed-effect parameters just for trend and
annual cycles. The models converge ok, eventually, in only a few
iterations and with typical condition number about 2.
I found this in the list archives (from Sophia Rabe-Hesketh in 2003):
==> biggest gain is to reduce M, followed by n, p and N
Here we have M=1, n=5 (down from the default of 8), p=17, but N=6E5
or more. There does not seem to be much prospect of reducing any of
those, indeed we will need to substantially increase p (for more
interesting models) and N (to cover all of Australia at once).
Is there hope? Are there alternatives to gllamm for this? Or are we
overlooking something basic here?
Keith
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
--
Dr Keith Dear
Senior Fellow
National Centre for Epidemiology and Population Health
ANU College of Medicine, Biology and Environment
Building 62, cnr Mills and Eggleston Roads
Australian National University
Canberra ACT 0200 Australia
T: 02 6125 4865
F: 02 6125 0740
M: 0424 450 396
W: nceph.anu.edu.au/Staff_Students/staff_pages/dear.php
CRICOS provider #00120C
http://canberragliding.org/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/