Both Paul Visintainer <[email protected]> and James Rosenthal
<[email protected]> are interested in processing times for fitting mixed models and
large datasets.
Paul asks:
> A colleague asked me about Stata's (ver 9.0) ability to run a mixed = model
> with 4 levels on a database with about 1 million records. If = anyone has
> run something close to this scenario, I'd appreciate your = input.
> I'd like to know how long Stata took to run the model and the =
> configuration of the machine it was run on (I assume its best to load as =
> much memory as the machine can take).
The output at the bottom of this email shows a mixed model fit on 1.12 million
observations, 4 levels of random effects, random intercept at each level.
Fitting the model took about 53 minutes on a P4 2.6Ghz, 1G RAM, running Fedora
Core Linux.
Of course, timings not only depend on the machine, but on the exact
configuration of the 4 grouping levels, number of fixed effects,
random-effects design, etc. Your mileage will vary.
Also note that what I have below is a 4-level model in Stata parlance,
equivalently a 5-level model in -gllamm- (and other hierarchical linear models
literature) terminology.
James asks:
> I have a much smaller problem (15,000 records with 3 or 4 levels) that
> SPSS MIXED runs out of memory on. HLM handles nicely, but I cannot
> incorporate a 4th level.
> If I knew STATA could handle problem, I might well upgrade to 9.0.
Since your problem is organized by "levels" (of nested random effects,
presumably), this shouldn't be a problem both memorywise and speedwise. Stata
takes advantage of the nesting to keep the dimension of the design matrix low,
and thus be less demanding on memory.
--Bobby
[email protected]
----------------------------begin xtmixed output------------------------------
. xtmixed y x1 || level1: || level2: || level3: || level4:, emlog
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log restricted-likelihood = -1333058.8
Iteration 1: log restricted-likelihood = -1333058.8
Computing standard errors:
Mixed-effects REML regression Number of obs = 1120000
-----------------------------------------------------------
| No. of Observations per Group
Group Variable | Groups Minimum Average Maximum
----------------+------------------------------------------
level1 | 20 56000 56000.0 56000
level2 | 400 2800 2800.0 2800
level3 | 8000 140 140.0 140
level4 | 160000 7 7.0 7
-----------------------------------------------------------
Wald chi2(1) = 493942.25
Log restricted-likelihood = -1333058.8 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | .4990131 .00071 702.81 0.000 .4976215 .5004048
_cons | -.7875853 .1372061 -5.74 0.000 -1.056504 -.5186663
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
level1: Identity |
sd(_cons) | .6027093 .1013493 .433484 .8379976
-----------------------------+------------------------------------------------
level2: Identity |
sd(_cons) | .5019323 .0191549 .4657591 .5409149
-----------------------------+------------------------------------------------
level3: Identity |
sd(_cons) | .4958665 .0042854 .487538 .5043373
-----------------------------+------------------------------------------------
level4: Identity |
sd(_cons) | .5001539 .0011706 .4978648 .5024535
-----------------------------+------------------------------------------------
sd(Residual) | .7069941 .0005102 .7059947 .7079948
------------------------------------------------------------------------------
LR test vs. linear regression: chi2(4) = 1.0e+06 Prob > chi2 = 0.0000
-----------------------------end xtmixed output-------------------------------
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/