I often need to do the same kind of thing on even larger datasets and have
found that commands like statsby (which use the construct "if bygroup==X")
are very slow when there are many panels because of repeated use of -if-
commands that require making a comparison for every observation in the
dataset. Statsby would speed up considerably if you just made a series of
smaller datasets and combined the results. But this approach seems kind of
silly and perhaps tedious unless you wrote your own wrapper for it.
Instead, I've written .ado code that uses -in- instead of -if- to identify
groups and find that it usually cuts down the calculation time by 85%-95% in
most fairly large datasets. The code is hardwired to use -regress- and
saves a pre-determined list of statistics into the current dataset (repeated
across each panel). Because of these limitations, I haven't posted it to
SSC. I've been planning to change it to be a virtually identical standin
for statsby (perhaps called statsbyin or statsbyfast) but haven't gotten
around to it.
If you want to see it or use it or adopt it for your needs, I can email it
directly to you.
Michael Blasnik
[email protected]
----- Original Message -----
From: <[email protected]>
To: <[email protected]>
Sent: Wednesday, July 16, 2003 3:31 AM
Subject: st: Using statsby command for large panel data set
> Hello
> I have a large unbalanced panel data set (observations on over 8000 firms
> for up to 135 periods). I want to undertake some simple time-series
> regressions for each firm and access the estimation results. The statsby
> command seems to be the appropriate command. But given the number of
> regressions, it takes a very long time to execute - in fact my impatience
> has got the better of me on every occasion I have tried to run the command
> on the full data set and I have interrupted the procedure. One solution
may
> be to be more patient! Another I thought of was to split the data set up
> into smaller units, run the statsby command on each of the smaller data
sets
> and then merge the estimation results to give the required outcome. But I
> wondered if there were other ways that people have tried, or whether there
> are ways to speed up the procedures. I want to use the estimated
> coefficients from the regressions to create transformations of the
original
> variables, so ideally I would like to work with the full dataset. For
info.
> I am using Stata/SE 8.1
>
> Thanks in advance for any advice.
>
> Darren
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/