Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: A faster way to gsort
From
Andrew Maurer <[email protected]>
To
Statalist Statalist <[email protected]>
Subject
st: A faster way to gsort
Date
Tue, 11 Mar 2014 15:49:59 +0000
Hi Statalist,
I've been finding that sorting is taking up a very significant portion of computation time in code that I'm writing (eg for 6hrs run time, 5.5hrs is spent sorting). I am looking for ways to do better than Stata's sort and gsort commands.
I just read about mata's use of permutation vectors (eg --help mata permutation--) and found that in at least the case below, they can be used for reverse sorting much faster than stata's --gsort--. gsort was 10x slower, at around 0.65 seconds vs the permutation method at around 0.065 seconds. (with 10^6 observations, times were ~7 seconds vs 1.4 seconds). I suppose there isn't a question here, but I'd be interested to hear if anyone has any thoughts.
************* Define Programs ***********
clear all
program revsort
// caller program for mata revsort()
tempvar id
gen long `id' = _n
mata : revsort("`id'")
end
mata
void revsort(string scalar id)
// "id" should be dataset's current sort index
// revsort() just reverses the dataset's order
{
st_view(V=.,.,.)
idpos = st_varindex(id)
V[.,.] = V[revorder(V[.,idpos]),.]
}
end
************* End Define Programs *******
************* Run Benchmark *************
/*
benchmarking gsort vs "revsort"
(mata permutation method using revorder())
*/
tempfile temp
local N = 10^5
set obs `N'
gen x = runiform()
save `temp'
forval i = 1/5 {
use `temp', clear
timer on `i'
gsort -x
timer off `i'
}
forval i = 11/15 {
use `temp', clear
timer on `i'
sort x
revsort
timer off `i'
}
timer list
************* End Benchmark *************
*** Output ***
* 1) gsort
1: 0.67 / 1 = 0.6700
2: 0.67 / 1 = 0.6710
3: 0.75 / 1 = 0.7490
4: 0.66 / 1 = 0.6550
5: 0.66 / 1 = 0.6550
* 2) revsort
11: 0.06 / 1 = 0.0620
12: 0.09 / 1 = 0.0940
13: 0.06 / 1 = 0.0620
14: 0.06 / 1 = 0.0620
15: 0.06 / 1 = 0.0630
* End Output *
Thank you,
Andrew Maurer
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/