Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Re: st: foreign language symbols not recognized in string variables
From
Christopher Baum <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: Re: st: foreign language symbols not recognized in string variables
Date
Sat, 27 Apr 2013 12:47:23 +0000
<>
On Apr 27, 2013, at 2:33 AM, Sergiy wrote:
>
> Most modern software (OS and applications) work with Unicode. Stata
> does not work with Unicode. Unicode encodes characters with 2 or more
> bytes. In Stata each character must be 1 byte only. You need to make
> sure the input CSV file is encoded in a codepage proper for your
> region, presumably 1252.
This is oversimplified and somewhat misleading. Unicode comes in several flavors. As Sergiy says, it can be used to represent all the world's alphabets (and more) in its 16-bit, 2-byte version, known as UTF-16. But there is also 8-bit, 1-byte Unicode, known as UTF-8, in which every character is represented by a single byte, as Stata expects.
The relevant constraint is not that Unicode data are necessarily two-byte characters, but that they are not ASCII (or EBCDIC) characters. At the present time, Stata does not cope well with non-ASCII characters, such as those that would be present in UTF-8 for a language such as Czech or Turkish which contains accented characters not available in ASCII (ISO Latin-1), or those using different alphabets such as Russian or Ukranian. We can hope that someday this constraint will be removed, and Stata will be able to deal with (at very least) UTF-8 encodings.
It is a great advantage of Unicode (UTF-8) that one need not encode files using a particular 'code page' (a DOS anachronism). Those contributing metadata to RePEc, for instance, need only use UTF-8, and all single-byte encodings will be properly handled by the 'modern software' that massages that metadata for display.
Cheers
Kit
Kit Baum | Boston College Economics & DIW Berlin | http://ideas.repec.org/e/pba1.html
An Introduction to Stata Programming | http://www.stata-press.com/books/isp.html
An Introduction to Modern Econometrics Using Stata | http://www.stata-press.com/books/imeus.html
| http://www.crup.com.cn/Item/111779.aspx
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/