Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: -areg- and -eststo- performance (page faults?) on OS X Lion
From
James Sams <[email protected]>
To
[email protected]
Subject
st: -areg- and -eststo- performance (page faults?) on OS X Lion
Date
Sun, 01 Jan 2012 23:08:50 -0600
For reference, my environment:
Stata 12 MP
OS X 10.7 Lion
32 GB RAM
4 cores
12 GB dataset (on disk size)
I'm trying to run areg with several different models with several dependent
variables for 44 groups of unequal size. Theoretically, I should be able to do
just
by subgroup: eststo ...: areg ...., absorb(id)
for each model and dependent variable. However, that turns out to being about
900 eststo's, and Stata only allows 300. So, I have it broken up into a loop
where it reads in only the subset of the data for the count of groups that
would consume all of eststo's available space and loops over the groups one by
one running areg on each and storing the results (in terms of customizing
titles and such, this turns out to be more convenient than using "by
subgroup". Plus given that "by subgroup"). This is fast at first, but stata
ends up getting slow, very slow. OS X's top indicates a very large number of
page faults (306,914,721 in about the past 30 hours of continuous running). I
assume this is the problem but don't really understand it. The memory usage
looks like this:
RPRVT RSHRD RSIZE VPRVT VSIZE
8268M 216K 8280M 8351M 10G
which should be more than enough to find contiguous portions of memory. I am
using the noesample option to eststo. Sometimes top shows stata as being
"stuck" with approximately 7% of CPU usage. Other times, it is classified as
"running and uses 400% of CPU . There is nothing else significant running on
the box (I'm running from the command line and the system is left at the GUI
login). I don't have good data on how long it is being left in each state
except that my process has been running for 30 hours, and based on my
rudimentary testing I expected it to take in the neighborhood of 12 hours
Speeding this up is important because a) I'll need to run similar regressions
on these same groups many times and the time it is taking is just too much and
b) I'd like to run this on a separate classification of groups that would give
me 1000 groups (so thousands of regressions). This might be faster due to a
potentially lower amount of memory required for that classification (but that
is not yet certain).
Any and all advice is appreciated.
--
James Sams
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/