On Thursday, Jul 17, 2003, at 02:33 US/Eastern, Michael wrote:
Instead, I've written .ado code that uses -in- instead of -if- to
identify
groups and find that it usually cuts down the calculation time by
85%-95% in
most fairly large datasets. The code is hardwired to use -regress- and
saves a pre-determined list of statistics into the current dataset
(repeated
across each panel). Because of these limitations, I haven't posted it
to
SSC. I've been planning to change it to be a virtually identical
standin
for statsby (perhaps called statsbyin or statsbyfast) but haven't
gotten
around to it.
I have found the same thing when assisting one of my colleagues who was
working with a very large panel dataset. The difference in speed
between in and if is tremendous (and logically so, since 'if' must
examine each observation for validity, including those you have already
processed). What we worked out for him (in the context of an unbalanced
panel) was a counter that tracked the first and last observation of
each unit; the 'in' clause then just looped over that counter. If you
have a balanced panel, it is even easier--you just create a couple of
counters and add T to them each time around the loop (which could be
done with a single forvalues statement). Writing your own loop, and
taking care of the minor housekeeping needed to stash the results of
estimation in a convenient place, will save you a lot of time overall.
Kit
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/