Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: still for Stata 14: a cache of sorted orderings for big data
From
László Sándor <[email protected]>
To
[email protected]
Subject
st: still for Stata 14: a cache of sorted orderings for big data
Date
Thu, 12 Sep 2013 17:38:47 -0400
Following up on the previous note, I think sort is just as bad an idea
for big data as a preserve-and-restore cycle. I could imagine an
option where I can allow Stata to save the last few sort
orderings/ranks even though it takes some memory, and then checks
whether the sorting variables change or a new sort is needed because
of a different sample restriction but otherwise quickly restores the
ordering.
I see why sort helps -by- (or even -tab-, perhaps), and is essential
for other tools like -xtile- or -mkspline-, but it is still a drag on
tens of millions of observations.
Thanks,
Laszlo
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/