Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: bug in Stata's sorted-by flag
From
Sergiy Radyakin <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: bug in Stata's sorted-by flag
Date
Thu, 15 Aug 2013 08:24:20 -0400
Haluk,
the -sort- command works correctly. There is no problem with the sort
command as far as I can see. The problem is with the -set obs N-
command not clearing the flag of 'sortedby'. If you are not using -set
obs N-, you should be fine.
My example shows in the output two lists, where you would notice how a
dot-missing and a .t-missing switch places. Whether that occurs in
non-Windows system, I don't know. But if you view the data with the
-browse- command after the program completes, you wont see any
abnormality because the second -sort- fixes it, and by changing the
order of observations demonstrates that the original dataset, for
which Stata reported it is sorted, was in fact is not.
Best, Sergiy Radyakin
On Thu, Aug 15, 2013 at 1:16 AM, Haluk Vahaboglu <[email protected]> wrote:
> Sergiy thank you for this highly useful discussion, that warned me to be careful about what is really happening in the dataset might be different than expected.
> I run your program "do http://radyakin.org/statalist/2013081402/sortbug.do" under Ubuntu 64-bit Stata 12.1 and checked what I see on the output screen against the "edit browser". It seems that in my conditions price is sorted correctly.
> You mentioned in your message that the false behavior of -sort- command might be restricted with windows environment.
> This is my question; is it possible, Stata may behave different according to the platform its working on?
> Is this a relevant question or am I totally misunderstood the issue?
> Thanks again
>
> Haluk Vahaboğlu
>
>
>> Date: Wed, 14 Aug 2013 20:50:31 -0400
>> Subject: st: bug in Stata's sorted-by flag
>> From: [email protected]
>> To: [email protected]
>>
>> Dear All,
>>
>> it seems that under some conditions Stata 9.2-12.1 (Windows)
>> incorrectly reports that the dataset is sorted while in fact it is
>> not.
>>
>> The following program demonstrates this:
>> do http://radyakin.org/statalist/2013081402/sortbug.do
>>
>> The problem seems that the Stata's built-in -set obs N- command is not
>> clearing the sorted flag while changing the data.
>>
>>
>> Here are some thoughts:
>>
>> This does have important implications. In particular the sorted state
>> is saved into a data file, and other (external) programs might rely on
>> it being correct. Stata itself might get confused in some cases, when
>> it inspects the sorted state, though I can't readily demonstrate it.
>>
>> An example of such an inconsistent datafile produced by Stata is here
>> (in v12 format):
>> http://radyakin.org/statalist/2013081402/sortbug.dta
>> or here (in v9 format):
>> http://radyakin.org/statalist/2013081402/sortbug9.dta
>>
>> A technical note in the following document:
>> http://www.stata.com/manuals13/dsort.pdf
>> explains that Stata is conservative and believes any chang to
>> variables involved in the sort order is destroying the sort order.
>> This means that sometimes one has to forgo a bit of performance to
>> verify the sort order when it is not needed. And this is OK.
>>
>> The converse is not good. Reporting that dataset as sorted when it is
>> not causes serious implications as (at least some) user-written
>> commands might be relying on the reported sort order to be credible.
>> Stata's own commands would probably also get confused. I expect (but
>> not checked) the -merge- command to behave erratically in this case,
>> since I expect it relies on the saved sorted order for the 'using'
>> datasets (secondary datasets).
>>
>> The list of the variables, by which a dataset is sorted is contained
>> in the macro sortedby as in:
>> display `"`: sortedby'"'
>>
>> This problem is found as partial explanation to what's happening with
>> the sortpreserve option in my code, the discussion started in this
>> thread:
>> http://www.stata.com/statalist/archive/2013-08/msg00563.html
>> and in which I am still interested. Even older discussions on the
>> -sort-'s performance can be found in my "sorting data puzzles"
>> postings here:
>> http://www.stata.com/statalist/archive/2008-01/index.html#00810
>>
>> Interestingly you would think that Stata itself should then refuse to
>> sort the already sorted dataset. But no, it does re-sort it as can be
>> seen here:
>> ********************************************************************
>> use http://radyakin.org/statalist/2013081402/sortbug.dta
>> list
>> describe
>> sort price, stable
>> list
>> describe
>> display c(changed)
>> ********************************************************************
>>
>> And given the problem, I am surprised to see how -collapse- continues
>> to produce the correct results, but it seems to be working despite the
>> dataset is not sorted.
>>
>> Best, Sergiy Radyakin
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/