Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: A faster way to gsort

From	Clyde Schechter <[email protected]>
To	[email protected]
Subject	Re: st: A faster way to gsort
Date	Fri, 14 Mar 2014 07:59:18 -0700

Joe Canner mused about the conjecture that any sort algorithm would
work faster if the data are already sorted, or nearly so.

That is probably true of most sort algorithms that are actually used
in production software.  But it is not true of all sort algorithms.

In particular, one possible sort algorithm is to read the data
sequentially unto an unbalanced binary branching tree, and then read
them out in sorted order.  This algorithm has the astonishing (to me,
at least) property that its expected performance is O(n log n), but,
if the data are already sorted, its performance is O(n^2).  In fact,
an already sorted data set is this algorithm's worst case scenario.

I don't think this approach to sorting is actually used in any
databases or statistical packages.  But it's an interesting curiosity.

Clyde Schechter
Department of Family & Social Medicine
Albert Einstein College of Medicine
Bronx, NY, USA
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: RE: RE: Extracting results from -rolling- in one file per window
Next by Date: st: RE: RE: RE: Extracting results from -rolling- in one file per window
Previous by thread: Re: large data sets (was st: A faster way to gsort)
Next by thread: st: Calculating group average of the other group I am not a member of
Index(es):
- Date
- Thread