John LeBlanc <[email protected]> et al.:
I would make a stronger statement than John-Paul Ferguson--it's
probably impossible to do for the general case, as different fonts can
map characters that are a bit like another modulo a diacritical mark
to different codes. If you can specify the mapping you want (between
characters and numeric codes) you can write a gsort2.ado that will
sort as you want, but you can also just generate a new variable that
will sort as you want, which is what a gsort2.ado would do, so there
is little to be gained. If you want to see how Stata will sort your
string, type:
forv i=32/255 {
di char(`i') _c
}
and note that capital letters get sorted before lower-case, which come
before all characters with diacritical marks. So you can predict how
this will come out:
clear
input str2 a
ok
Ok
no
zz
�k
end
sort a
li
Also note different folks might want different orderings, even if
numeric codes were perfectly stable, e.g. consider � in Swedish or
German:
http://en.wikipedia.org/wiki/Swedish_alphabet
http://en.wikipedia.org/wiki/German_alphabet#Sorting
On Wed, Jun 18, 2008 at 10:13 PM, John-Paul Ferguson <[email protected]> wrote:
> Looking at the source for gsort reveals that it's mostly engaged in macro
> manipulation with an occasional call to sort to do the basic work. Since
> sort
> itself is a built-in command, it would almost HAVE to be Stata that made any
> such modification.
>
> John-Paul Ferguson
>
> Quoting John LeBlanc <[email protected]>:
>
>> Thanks; I was hoping that Stata had a built-in option to ignore accents.
>> Some software with sort routines have the ability to give characters with
>> diacritical marks the same value as their own. Is this not an issue for
>> non-English Stata users? Is there sufficient desire to justify asking stata
>> for this feature, e.g., as an option to gsort?
>>
>>
>> John
>>
>> On Wed, 18 Jun 2008 12:53:14 +0200, Svend Juul wrote:
>>
>> John LeBlanc wrote:
>>
>> How does one ignore accents while sorting international characters?
>>
>> sort & gsort deliver this:
>>
>> ecole
>> school
>> �cole
>>
>> What I'd like is this:
>> ecole
>> �cole
>> school
>>
>> ============================================================
>>
>> I believe that you must generate a second variable with no accents
>> to get it right:
>>
>> gen str10 key2=key
>> replace key2 = subinstr(key2,"�","e",.)
>> replace key2 = subinstr(key2,"�","o",.)
>> ...
>> sort key2 key
>>
>> I included key as a secondary sort key to make � come after e.
>>
>> Hope this helps
>> Svend
>>
>>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/