Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: bug in Stata's sorted-by flag
From
Haluk Vahaboglu <[email protected]>
To
"[email protected]" <[email protected]>
Subject
RE: st: bug in Stata's sorted-by flag
Date
Thu, 15 Aug 2013 05:16:04 +0000
Sergiy thank you for this highly useful discussion, that warned me to be careful about what is really happening in the dataset might be different than expected.
I run your program "do http://radyakin.org/statalist/2013081402/sortbug.do" under Ubuntu 64-bit Stata 12.1 and checked what I see on the output screen against the "edit browser". It seems that in my conditions price is sorted correctly.
You mentioned in your message that the false behavior of -sort- command might be restricted with windows environment.
This is my question; is it possible, Stata may behave different according to the platform its working on?
Is this a relevant question or am I totally misunderstood the issue?
Thanks again
Haluk Vahaboğlu
> Date: Wed, 14 Aug 2013 20:50:31 -0400
> Subject: st: bug in Stata's sorted-by flag
> From: [email protected]
> To: [email protected]
>
> Dear All,
>
> it seems that under some conditions Stata 9.2-12.1 (Windows)
> incorrectly reports that the dataset is sorted while in fact it is
> not.
>
> The following program demonstrates this:
> do http://radyakin.org/statalist/2013081402/sortbug.do
>
> The problem seems that the Stata's built-in -set obs N- command is not
> clearing the sorted flag while changing the data.
>
>
> Here are some thoughts:
>
> This does have important implications. In particular the sorted state
> is saved into a data file, and other (external) programs might rely on
> it being correct. Stata itself might get confused in some cases, when
> it inspects the sorted state, though I can't readily demonstrate it.
>
> An example of such an inconsistent datafile produced by Stata is here
> (in v12 format):
> http://radyakin.org/statalist/2013081402/sortbug.dta
> or here (in v9 format):
> http://radyakin.org/statalist/2013081402/sortbug9.dta
>
> A technical note in the following document:
> http://www.stata.com/manuals13/dsort.pdf
> explains that Stata is conservative and believes any chang to
> variables involved in the sort order is destroying the sort order.
> This means that sometimes one has to forgo a bit of performance to
> verify the sort order when it is not needed. And this is OK.
>
> The converse is not good. Reporting that dataset as sorted when it is
> not causes serious implications as (at least some) user-written
> commands might be relying on the reported sort order to be credible.
> Stata's own commands would probably also get confused. I expect (but
> not checked) the -merge- command to behave erratically in this case,
> since I expect it relies on the saved sorted order for the 'using'
> datasets (secondary datasets).
>
> The list of the variables, by which a dataset is sorted is contained
> in the macro sortedby as in:
> display `"`: sortedby'"'
>
> This problem is found as partial explanation to what's happening with
> the sortpreserve option in my code, the discussion started in this
> thread:
> http://www.stata.com/statalist/archive/2013-08/msg00563.html
> and in which I am still interested. Even older discussions on the
> -sort-'s performance can be found in my "sorting data puzzles"
> postings here:
> http://www.stata.com/statalist/archive/2008-01/index.html#00810
>
> Interestingly you would think that Stata itself should then refuse to
> sort the already sorted dataset. But no, it does re-sort it as can be
> seen here:
> ********************************************************************
> use http://radyakin.org/statalist/2013081402/sortbug.dta
> list
> describe
> sort price, stable
> list
> describe
> display c(changed)
> ********************************************************************
>
> And given the problem, I am surprised to see how -collapse- continues
> to produce the correct results, but it seems to be working despite the
> dataset is not sorted.
>
> Best, Sergiy Radyakin
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/