Dan Weitzenfeld :
Stata's -file- command can deal with this file; see -help file- for
examples of writing a loop to process a file. But converting in
another program, then using -infile- or -insheet-, is likely easier.
The optimal approach depends on how often you will face this situation
again in future...
On Tue, Sep 23, 2008 at 2:28 PM, Steven Samuels
<[email protected]> wrote:
> Dan, I don't know if Stata can read unicode. The -help- for -insheet-
> states it is for ASCII text. One possibility; use a text editor to add
> double quotes (") at the beginning and end of lines and on either side of
> the commas. This may read everything as character. Then convert the convert
> back to real only the variable you want.
>
> -Steve
>
> On Sep 23, 2008, at 2:19 PM, Dan Weitzenfeld wrote:
>
>> I've been informed that the files are written in unicode, utf-16. Can
>> Stata read this?
>>
>> On Tue, Sep 23, 2008 at 11:08 AM, Dan Weitzenfeld
>> <[email protected]> wrote:
>>>
>>> Thanks Sergiy, I did not know about that command. Below is a line
>>> from my hexdump:
>>>
>>> 130 | 304b ff1f 002c 0031 002c 0032 000d 000a |
>>> 0K...,.1.,.2....
>>>
>>> I also noticed this when I ran with option Analyze:
>>>
>>> Line-end characters
>>> \r\n (Windows) 0
>>> \r by itself (Mac) 5
>>> \n by itself (Unix) 5
>>>
>>> which looks suspicious to me. I'll talk to the tech guys who made this
>>> file.
>>> Thanks again Sergiy.
>>>
>>>
>>>
>>> On Tue, Sep 23, 2008 at 10:51 AM, Sergiy Radyakin
>>> <[email protected]> wrote:
>>>>
>>>> Dear Dan,
>>>>
>>>> how data "looks like" depends on, which software "looks" at it. From
>>>> what I see in your message, there is double-byte encoding of letters
>>>> which may cause a problem.
>>>>
>>>> I suggest you first "look" at your data byte-by-byte, to find a
>>>> pattern you need, then filter your data based on that pattern.
>>>> Use
>>>> -hexdump- filename
>>>> to see how your data is structured. Check that you are using correct
>>>> separator "comma" and not "tab", that "comma" in your file is indeed a
>>>> standard ASCII "comma" and not some weird two-bytes comma, that a
>>>> "comma" byte (44) is not used for encoding other characters, etc.
>>>>
>>>> Perhaps you could post a portion of output from hexdump here if this
>>>> does not contradict any rules of the list.
>>>>
>>>> Regards, Sergiy Radyakin
>>>>
>>>>
>>>> On Tue, Sep 23, 2008 at 1:09 PM, Dan Weitzenfeld
>>>> <[email protected]> wrote:
>>>>>
>>>>> Hi All,
>>>>> Quick but strange question. I'm trying to insheet a comma-delimited
>>>>> file with Japanese in it. For example, the first line looks like:
>>>>>
>>>>> あなたはこのCMが好きですか?,0,とても好き
>>>>>
>>>>> The only information I need is the second variable, the 0, which will
>>>>> always be numeric.
>>>>>
>>>>> However, when I insheet the file, I get nonsense:
>>>>>
>>>>> þÿ0B0j0_0o0S0nÿ#ÿ-0LY}0M0g0Y0Kÿ 0h0f0‚Y}0M
>>>>>
>>>>> which would be okay, except that the second variable always comes in as
>>>>> blank.
>>>>>
>>>>> Does anyone know of a solution for this?
>>>>>
>>>>> Thanks in advance,
>>>>> Dan
>>>>>
>>>>> *
>>>>> * For searches and help try:
>>>>> * http://www.stata.com/help.cgi?search
>>>>> * http://www.stata.com/support/statalist/faq
>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/statalist/faq
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>
>>>
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/