Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Stata's character encoding
From
Billy Schwartz <[email protected]>
To
[email protected]
Subject
st: Stata's character encoding
Date
Mon, 23 Jul 2012 12:03:00 -0400
I'm trying to generate automatically some Stata scripts from an
external program* that by default encodes all text files at UTF-8.
Best I can tell, Stata uses whatever character encoding is native to
the platform it's on (e.g., Windows-1252 on Windows) which means the
only portable character encoding is plain ASCII (no characters with
code points above 127) for reading Do-files and spreadsheet data and
is flexible about whether line endings are LF or CRLF (but must be
consistent within a given file -- I've had problems loading
spreadsheet data that were CRLF line-terminated but randomly
distributed CRs throughout the data made Stata think there were line
endings where there weren't.)
Is this a correct characterization of the way Stata reads text files?
If not, what's the most portable way for me to encode text for both Do
files and spreadsheet data?
-----------
*I'm writing Python scripts to write Stata scripts because I expect my
input data to change several times and I don't want to have to hand
rewrite Stata code each time the underlying data changes. I find
examining directory structures and reading non-tabular data (in this
case, the record layouts for the data I'm working with) easier to
express in Python than in Stata. I'm open to suggestions on best ways
to deal with this, but since I've got it mostly written, that's not
the main goal of this email.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/