Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Sergiy Radyakin <serjradyakin@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: bug in Stata's sorted-by flag |
Date | Thu, 15 Aug 2013 13:10:27 -0400 |
Dear Bill, thank you very much for looking into this case and the intent to fix it. I have found that not only -describe- will misreport the sorting order, but a trivial way to check the sorted order would also not work: assert `vn'>=`vn'[_n-1] if _n>1 which must hold for any dataset sorted in ascending order on `vn' A program counting unique values would also fail (if it trusted the `sortedby' that the data is sorted by `vn') count if `vn'!=`vn'[_n-1] & _n>1 display r(N)+1 Also since the problem occurs with strings as well (see the first test with the make variable), it is not only EXTENDED missing values. However those few Stata commands that I tried to test on such an inconsistent dataset indeed worked fine: -codebook-, -inspect- and other candidates for a problem. Finally, I have found that instead of the -set obs N- I can use -expand M in L- (where M is the desired increment, and L is really just an L). Since I am replacing the values anyway, I don't need to rely on them to be missing to begin with. Interestingly enough, in this case -expand- DOES reset the sortedby flag, and this is exactly the case when it could leave it as is, since duplication of the observations in the top would not distort the sorting order. What an irony. Best regards, Sergiy Radyakin On Thu, Aug 15, 2013 at 11:30 AM, William Gould, StataCorp LP <wgould@stata.com> wrote: > Sergiy Radyakin <serjradyakin@gmail.com> reports, > >> it seems that under some conditions Stata 9.2-12.1 (Windows) >> incorrectly reports that the dataset is sorted while in fact it is >> not. > > Sergiy reports that this happens when > > 1. The data are sorted by a variable or variables, say myvar. > > 2. One or move observations of myvar contain EXTENDED missing > values (.a, .b, ..., .z). > > 3. -set obs- is used to add extra observations to the end > of the dataset. > > The data are fine, but -describe- will report that the data are sorted > by myvar, which is not true because, . < .a < .b < ... < .z. > > In most cases the bug has no implications beyond the mistaken > -describe-, which is why it's gone undiscovered for 8+ years. > > We will fix it. > > In the meantime, the workaround is to -sort- the data after -set obs-. > You must sort on an extraneous variable, > > . set obs ... > . sort a > . sort myvar > > You might worry that, because the internal sort marker is incorrect, > lots of other problems could arise. In general, that would be true. > In this case, however, such problems do not arise because all the > misordering occurs within missing values. > > > -- Bill > wgould@stata.com > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/