Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Splitting Dataset - Save by unique identifier
From
Daniel Feenberg <[email protected]>
To
[email protected]
Subject
Re: st: Splitting Dataset - Save by unique identifier
Date
Sun, 28 Oct 2012 09:56:24 -0400 (EDT)
On Sat, Oct 27, 2012 at 5:28 PM, Tim Streibel <[email protected]> wrote:
Hey all,
I am having a question I am currently computing abnormal returns in a way that implies opening a large dataset (about 2m obs.) about 400 times which I think costs a lot of time.
So my idea is to create small datasets (for each stock one dataset). Is there a way to quickly create a dataset only containing the observations of one stock (uniquely identified by Permno)?
Currently my only idea is to open the large dataset drop all obs. except the ones of one stock and save it. But doing that for every stock forces me to open the large dataset 10 000 times, so it doesn't really save me time.
Some combination of by (permno) and save would be cool.
While -save- does not allow -if- or -in- qualifiers, -outsheet- does.
Depending on the exact details of your dataset, the conversion overhead
might be worthwhile. Of course, -by- would be even better, but I don't see
how to get that advantage. Just reducing the i/o with outsheet will likely
be a big help, though.
Note that rules of thumb (such as avoiding looping over Stata
statements) are only rules of thumb, and when datasets get very large,
they may no longer hold. In your case I might examine the possibility
of using the -file open- and -file write- statements in a double loop.
It might be worth the programming effort, depending on how often you
will want to do this.
daniel feenberg
NBER
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/