Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Splitting Dataset - Save by unique identifier

From	Daniel Feenberg <[email protected]>
To	[email protected]
Subject	Re: st: Splitting Dataset - Save by unique identifier
Date	Sun, 28 Oct 2012 09:56:24 -0400 (EDT)

On Sat, Oct 27, 2012 at 5:28 PM, Tim Streibel <[email protected]> wrote:

Hey all,

I am having a question I am currently computing abnormal returns in a way that implies opening a large dataset (about 2m obs.) about 400 times which I think costs a lot of time.

So my idea is to create small datasets (for each stock one dataset). Is there a way to quickly create a dataset only containing the observations of one stock (uniquely identified by Permno)?

Currently my only idea is to open the large dataset drop all obs. except the ones of one stock and save it. But doing that for every stock forces me to open the large dataset 10 000 times, so it doesn't really save me time.

Some combination of by (permno) and save would be cool.

While -save- does not allow -if- or -in- qualifiers, -outsheet- does.Depending on the exact details of your dataset, the conversion overheadmight be worthwhile. Of course, -by- would be even better, but I don't seehow to get that advantage. Just reducing the i/o with outsheet will likelybe a big help, though.


Note that rules of thumb (such as avoiding looping over Stata
statements) are only rules of thumb, and when datasets get very large,
they may no longer hold. In your case I might examine the possibility
of using the -file open- and -file write- statements in a double loop.
It might be worth the programming effort, depending on how often you
will want to do this.

daniel feenberg
NBER
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: Is it possible to use --rename-- with the renumber option to rename variables in reverse order?
Next by Date: st:"endoegnous binary regressor"
Previous by thread: Re: st: Splitting Dataset - Save by unique identifier
Next by thread: st: Is it possible to use --rename-- with the renumber option to rename variables in reverse order?
Index(es):
- Date
- Thread