[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: in not if for large panels

From	Kit Baum <[email protected]>
To	[email protected]
Subject	st: Re: in not if for large panels
Date	Thu, 17 Jul 2003 09:57:31 -0400

On Thursday, Jul 17, 2003, at 02:33 US/Eastern, Michael wrote:

Instead, I've written .ado code that uses -in- instead of -if- to identify
groups and find that it usually cuts down the calculation time by 85%-95% in
most fairly large datasets. The code is hardwired to use -regress- and
saves a pre-determined list of statistics into the current dataset (repeated
across each panel). Because of these limitations, I haven't posted it to
SSC. I've been planning to change it to be a virtually identical standin
for statsby (perhaps called statsbyin or statsbyfast) but haven't gotten
around to it.

I have found the same thing when assisting one of my colleagues who was working with a very large panel dataset. The difference in speed between in and if is tremendous (and logically so, since 'if' must examine each observation for validity, including those you have already processed). What we worked out for him (in the context of an unbalanced panel) was a counter that tracked the first and last observation of each unit; the 'in' clause then just looped over that counter. If you have a balanced panel, it is even easier--you just create a couple of counters and add T to them each time around the loop (which could be done with a single forvalues statement). Writing your own loop, and taking care of the minor housekeeping needed to stash the results of estimation in a convenient place, will save you a lot of time overall.

Kit

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- st: announcing statsbyfast - a faster statsby
  - From: "Michael Blasnik" <[email protected]>

Prev by Date: st: RE: subset (using by ...)
Next by Date: st: Proportion as a dependent variable
Previous by thread: st: subset (using by ...)
Next by thread: st: announcing statsbyfast - a faster statsby
Index(es):
- Date
- Thread