Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: RE: Stata treatment of sort order
From
"Sarah Edgington" <[email protected]>
To
<[email protected]>
Subject
RE: st: RE: Stata treatment of sort order
Date
Thu, 6 Mar 2014 13:47:45 -0800
Andrew,
Your example below still isn't actually an unsorted list. Your original
data is reverse-sorted on the variable in question. Given what little I
know about sorting algorithms (and it is very little and not very
formalized), I still wouldn't expect a well-optimized algorithm to
necessarily change the order of data is already effectively sorted. It
would take more processor time to rearrange that data than to simply leave
it where it is. However, note that if you stick the x=0 observation in the
middle (that is, start with a list that is not already sorted) you do get
different results each time.
These examples where Stata is sorting already sorted data seem to be a
special case. Granted, they're a special case that doesn't strictly
speaking match the behavior the documentation suggests. However, I'm not
sure that outlining the exact circumstances when Stata changes the sort
order and when it doesn't would actually make the documentation clearer. It
strikes me as the sort of detail that might satisfy the pedantic while
confusing the vast majority.
-Sarah
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Andrew Maurer
Sent: Thursday, March 06, 2014 1:01 PM
To: [email protected]
Subject: RE: st: RE: Stata treatment of sort order
Rich,
Thanks for this reference. This is interesting, since I don't know how Stata
could sort datasets without the "`: sortedby'" flag "instantly". Wouldn't
the sort on an already sorted set take at least O(n)? (ie: doesn't the
program need to loop once and verify that x[i] <= x[i+1] for i from 1 to
_N-1?)
Sarah,
Thanks for the response. However, here's an example of an unsorted list,
with repeated values of the sort variable, where the final sort order is
always the same after --sort x--. This seems like it contradicts the
documentation's assertion that, "the ordering of observations with equal
values of varlist is randomized". Perhaps "sometimes randomized" would be
more appropriate.
****** Begin code ******
clear all
set obs 10
gen id = _n
gen x = 1 in 1/9
replace x = 0 in 10
sort x
****** End code ********
Output is always:
id x
10 0
9 1
8 1
7 1
6 1
5 1
4 1
3 1
2 1
1 1
Andrew Maurer
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Richard Goldstein
Sent: Thursday, March 06, 2014 2:20 PM
To: [email protected]
Subject: Re: st: RE: Stata treatment of sort order
actually, the manual specifically deals with this: "Stata may be dumb, but
it is also fast. It sorts already-sorted datasets instantly, so Stata's
ignorance costs us little." p. 603
Rich
On 3/6/14, 3:14 PM, Sarah Edgington wrote:
> Andrew,
> In the example in your second question you're asking Stata to sort the
> data on a variable on which it is already sorted. In that case I
> would not expect Stata to change the ordering of the data at all, with
> or without the stable option. Even though you're pasting in new data
> (so Stata has no knowledge of the existing sort order) I would expect
> that the sorting algorithm would do some checking of whether the data
> was already in the order you requested. Since it is already sorted in
> that order, I wouldn't expect the data to be changed. Admittedly
> that's just a guess since I don't have any information on how Stata
> implements sorting, but it would explain the behavior.
>
> However, you can see that if the data is NOT already sorted on the
> variable of interest that the sort order does change over multiple
> sorts. For example, using the auto data, try to -sort price- then
> -sort foreign-. If you do this multiple times you'll note that the
> ordering is different after -sort foreign-.
>
> -Sarah
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Andrew
> Maurer
> Sent: Thursday, March 06, 2014 11:36 AM
> To: [email protected]
> Subject: st: Stata treatment of sort order
>
> Hi Statalist,
>
> I'm wondering if anyone can help explain some details about Stata and
> sorting
>
> First, where does Stata hold information about current sort order? Ie,
> the extended macro function --`: sortedby'-- returns the current sort
order.
> However, looking at --char dir-- and --macro dir-- I don't see the
> information there. In particular, I want to overwrite the value, so
> that
> --`: sortedby'-- will return the value that I insert. One use might be
> if I -infile-, and I already know the sort order of the data, but
> don't want to have to run sort just to populate `: sortedby'. (In
> --help dta--, I see where it's stored in a physical dta file
> [<sortlist>sortlist</sortlist>], but it doesn't explain where it is put in
memory.
>
> Second, the help file for sort seems somewhat misleading. --help
> sort-- explains, "Without the stable option, the ordering of
> observations with equal values of varlist is randomized." What does
> "randomized" here mean? I interpret it to mean that each residual
> observation has an equal probability of being in any of the slots
> specified by the sort list (eg that --sort
> var1-- is equivalent to --gen rand = runiform()-- --sort var1 rand--
> --drop
> rand-- However, residual sort order doesn't always appear random. For
> example, if I --sysuse auto--, --sort foreign--, then copy the data to
> clipboard, --clear--, then use data editor to paste the data back, and
> finally --sort foreign--, the ordering is always the same as the
> original ordering (ie: the ordering of observations with equal values
> of varlist was /not/ randomized.
>
> Is anyone able to explain these observations?
>
> Thank you,
>
> Andrew Maurer
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/