Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Mata order() indeterminate
From
[email protected] (Vince Wiggins, StataCorp)
To
[email protected]
Subject
Re: st: Mata order() indeterminate
Date
Mon, 04 Nov 2013 16:22:41 -0600
Brendan Halpin <[email protected]> asks about obtaining stable
ordering (sorts) when there are tied values and when using the
-order()- function in Mata.
Brendan provides an example, see below, where the ordering of ties is
done differently each time the functions are run.
As Joseph Coveney <[email protected]> discusses, this randomness in
ordering tied values is intentional. It keeps us from seeing
consistency where there is none.
That said, we have no problem with someone knowingly breaking ties
consistently. A surefire solution is to create another column in your
ordering matrix that has a sequential ordering, then include that
column when using -order()-. If you were ordering on the first column
of matrix y you would type -order(y, 1)-. After inserting a second
column that is has sequential values you would type -order(y, 1..2)-.
That solution makes it explicit how you want your ties broken.
That said, there is another way. Mata's -order()- and -sort()- use
Stata's internal sort seed, see -help sortseed-, to order ties
consistently. Brendan can set that seed before using -order()- to get
reproducible sorts. Just code
: stata("set sortseed 12345")
We will consider exposing this approach in the documentation, and even
adding a -sortseed()- function to Mata. There is, however, a clarity
in requiring that ties be broken explicitly.
-- Vince
[email protected]
------------ Original message ---------------------------
From: [email protected] (Brendan Halpin)
Subject: st: Mata order() indeterminate
Sender: [email protected]
Lines: 40
I'm checking a simulation in Mata, and find that setting the seed to the
same value does not yield the same results on repeated runs. I've
tracked it down to the use of -order()- to sort a matrix, where there
are many ties. It appears that -order()- brings in indeterminacy in
dealing with ties, but from somewhere other than the random-number system.
This snippet illustrates the issue:
mata:
x = range(1,10,1)
y = runiform(10,1):>0.5
for (i=1; i<=20; i++) {
rseed(12345)
x, x[order(y,1),]
}
end
Though x is unchanged, and the seed is set to the same value at each
pass, x[order(y,1)] changes.
While this is disturbing, I presume it is consistent with Stata policy
with regard to sorting indeterminacy.
How do I get repeatable sorting in this context?
Regards,
Brendan
--
Brendan Halpin, Head of Department, Sociology, University of Limerick, Ireland
Tel: w +353-61-213147 f +353-61-202569 h +353-61-338562; Room F1-002 x 3147
mailto:[email protected] ULSociology on Facebook: http://on.fb.me/fjIK9t
http://teaching.sociology.ul.ie/bhalpin/wordpress twitter:@ULSociology
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/