Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Insheeting Japanese


From   Steven Samuels <[email protected]>
To   [email protected]
Subject   Re: st: Insheeting Japanese
Date   Tue, 23 Sep 2008 14:28:51 -0400

Dan, I don't know if Stata can read unicode. The -help- for - insheet- states it is for ASCII text. One possibility; use a text editor to add double quotes (") at the beginning and end of lines and on either side of the commas. This may read everything as character. Then convert the convert back to real only the variable you want.

-Steve

On Sep 23, 2008, at 2:19 PM, Dan Weitzenfeld wrote:


I've been informed that the files are written in unicode, utf-16.  Can
Stata read this?

On Tue, Sep 23, 2008 at 11:08 AM, Dan Weitzenfeld
<[email protected]> wrote:
Thanks Sergiy, I did not know about that command. Below is a line
from my hexdump:

130 | 304b ff1f 002c 0031 002c 0032 000d 000a | 0K...,. 1.,.2....

I also noticed this when I ran with option Analyze:

Line-end characters
\r\n (Windows) 0
\r by itself (Mac) 5
\n by itself (Unix) 5

which looks suspicious to me. I'll talk to the tech guys who made this file.
Thanks again Sergiy.



On Tue, Sep 23, 2008 at 10:51 AM, Sergiy Radyakin
<[email protected]> wrote:

Dear Dan,

how data "looks like" depends on, which software "looks" at it. From
what I see in your message, there is double-byte encoding of letters
which may cause a problem.

I suggest you first "look" at your data byte-by-byte, to find a
pattern you need, then filter your data based on that pattern.
Use
-hexdump- filename
to see how your data is structured. Check that you are using correct
separator "comma" and not "tab", that "comma" in your file is indeed a
standard ASCII "comma" and not some weird two-bytes comma, that a
"comma" byte (44) is not used for encoding other characters, etc.

Perhaps you could post a portion of output from hexdump here if this
does not contradict any rules of the list.

Regards, Sergiy Radyakin


On Tue, Sep 23, 2008 at 1:09 PM, Dan Weitzenfeld
<[email protected]> wrote:

Hi All,
Quick but strange question. I'm trying to insheet a comma- delimited
file with Japanese in it. For example, the first line looks like:

あなたはこのCMが好きですか?,0,とても好き

The only information I need is the second variable, the 0, which will
always be numeric.

However, when I insheet the file, I get nonsense:

þÿ0B0j0_0o0S0nÿ#ÿ-0LY}0M0g0Y0Kÿ 0h0f0‚Y}0M

which would be okay, except that the second variable always comes in as blank.

Does anyone know of a solution for this?

Thanks in advance,
Dan

*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index