Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Richard Goldstein <richgold@ix.netcom.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: RE: Stata treatment of sort order |
Date | Thu, 06 Mar 2014 15:20:18 -0500 |
actually, the manual specifically deals with this: "Stata may be dumb, but it is also fast. It sorts already-sorted datasets instantly, so Stata’s ignorance costs us little." p. 603 Rich On 3/6/14, 3:14 PM, Sarah Edgington wrote: > Andrew, > In the example in your second question you're asking Stata to sort the data > on a variable on which it is already sorted. In that case I would not > expect Stata to change the ordering of the data at all, with or without the > stable option. Even though you're pasting in new data (so Stata has no > knowledge of the existing sort order) I would expect that the sorting > algorithm would do some checking of whether the data was already in the > order you requested. Since it is already sorted in that order, I wouldn't > expect the data to be changed. Admittedly that's just a guess since I don't > have any information on how Stata implements sorting, but it would explain > the behavior. > > However, you can see that if the data is NOT already sorted on the variable > of interest that the sort order does change over multiple sorts. For > example, using the auto data, try to -sort price- then -sort foreign-. If > you do this multiple times you'll note that the ordering is different after > -sort foreign-. > > -Sarah > > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Andrew Maurer > Sent: Thursday, March 06, 2014 11:36 AM > To: statalist@hsphsun2.harvard.edu > Subject: st: Stata treatment of sort order > > Hi Statalist, > > I'm wondering if anyone can help explain some details about Stata and > sorting > > First, where does Stata hold information about current sort order? Ie, the > extended macro function --`: sortedby'-- returns the current sort order. > However, looking at --char dir-- and --macro dir-- I don't see the > information there. In particular, I want to overwrite the value, so that > --`: sortedby'-- will return the value that I insert. One use might be if I > -infile-, and I already know the sort order of the data, but don't want to > have to run sort just to populate `: sortedby'. (In --help dta--, I see > where it's stored in a physical dta file [<sortlist>sortlist</sortlist>], > but it doesn't explain where it is put in memory. > > Second, the help file for sort seems somewhat misleading. --help sort-- > explains, "Without the stable option, the ordering of observations with > equal values of varlist is randomized." What does "randomized" here mean? I > interpret it to mean that each residual observation has an equal probability > of being in any of the slots specified by the sort list (eg that --sort > var1-- is equivalent to --gen rand = runiform()-- --sort var1 rand-- --drop > rand-- However, residual sort order doesn't always appear random. For > example, if I --sysuse auto--, --sort foreign--, then copy the data to > clipboard, --clear--, then use data editor to paste the data back, and > finally --sort foreign--, the ordering is always the same as the original > ordering (ie: the ordering of observations with equal values of varlist was > /not/ randomized. > > Is anyone able to explain these observations? > > Thank you, > > Andrew Maurer * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/