On Mon, Nov 10, 2008 at 11:58 AM, Neil Shephard <[email protected]> wrote:
> Ada Ma wrote:
>> Hi Statalist,
>>
>> Is there a way to stop Stata seeing double quotes as delimiters? I
>> have data files in txt format, the data is pipe (|) delimited, but the
>> people who generated the data also use double quotes (") to specify
>> missing variables so I got a large number of pipes with a couple of
>> double quotes I find in the txt files.
>>
>> I can read the data in Stata fine - only if I open up the text files
>> and remove all the double quotes before I -insheet- the data with pipe
>> specified as the delimiter. If would be nice if I don't have to check
>> for double quotes first because it would save me the time opening up
>> the data files twice - first for getting rid of double quotes and
>> another for reading it into Stata.
>>
> Without seeing an example I don't understand the problem. It sounds as
> though you are using the -delimiter("char")- option, e.g.
>
> insheet using [path/to/your/file/filename], delim("|") clear
>
> So its irrelevant what the people who generated the data used to sepcify
> the missing variable (which you indicate to be double quotes), the
> delimiter is "|" and is explicitly defined and anything between these
> delimiteres is considered by Stata to be a variable.
>
> This may result in some data that is intended to be numeric being read
> as string, but you can -destring- or otherwise convert afterwards.
>
> Neil
>
> --
Hi Neil,
Thanks for the reply. Here is an example I have created which is
close to what happened. The data should look like this:
epikey hrg code1 code2 code3
1 A0123 D100 V123 K166
2 A0125 D200 " G122
3 B0101 D300 " C333
4 B0122 D400 E002 V777
It is pipe delimited so in the text file it looks like this:
epikey|hrg|code1|code2|code3
1|A0123|D100|V123|K166
2|A0125|D200|"|G122
3|B0101|D300|"|C333
4|B0122|D400|E002|V777
When I specified the command as you stated above, i.e. specifying the
delim("|") option, Stata reads in this:
epikey hrg code1 code2 code3
1 A0123 D100 V123 K166
2 A0125 D200 |G1223|B0101|D300| C333
4 B0122 D400 E002 V777
So everything between the double quotes are treated as one string. Is
there any way to get around this without editing the txt file?
Thanks again!
Ada
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/