Thanks a lot to both for the solutions you have suggested. I think
the -filefilter- command will be the easiest to implement given that
I'm on a Windoze system!
Regards,
Ada
On Mon, Nov 10, 2008 at 12:59 PM, Nick Cox <[email protected]> wrote:
> Utilities like sed are a good idea; as Neil says, they have been ported
> to Windows too (GNU project as well as the sources he cites).
>
> But check out -filefilter- in Stata.
>
> [D] filefilter . . . . . Convert ASCII text or binary patterns in
> a file
> (help filefilter)
>
> FAQ . . . . . . . . . . . . . . . . . . . . Malformed end-of-line
> sequence
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J.
> Hassell
> 12/03 Why do I get rows of missing data when I use infile?
> http://www.stata.com/support/faqs/data/miss_data.html
>
> SJ-8-2 pr0039 . Stata tip 60: Fast and easy changes to files with
> filefilter
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. R.
> Riley
> Q2/08 SJ 8(2):290--292 (no
> commands)
> tip on how to make changes to a file using the
> filefilter command
>
>
> I would pre-process the file so that double quotes were edited to
> something else. The character @ is often a good candidate.
>
> You can check with -hexdump- which characters are used in the file.
>
> The FAQ and Stata Tip give detailed examples.
>
> Nick
> [email protected]
>
> Neil Shephard
>
> Ada Ma wrote:
>> Thanks for the reply. Here is an example I have created which is
>> close to what happened. The data should look like this:
>>
>> epikey hrg code1 code2 code3
>> 1 A0123 D100 V123 K166
>> 2 A0125 D200 " G122
>> 3 B0101 D300 " C333
>> 4 B0122 D400 E002 V777
>>
>> It is pipe delimited so in the text file it looks like this:
>>
>> epikey|hrg|code1|code2|code3
>> 1|A0123|D100|V123|K166
>> 2|A0125|D200|"|G122
>> 3|B0101|D300|"|C333
>> 4|B0122|D400|E002|V777
>>
>> When I specified the command as you stated above, i.e. specifying the
>> delim("|") option, Stata reads in this:
>>
>> epikey hrg code1 code2
> code3
>> 1 A0123 D100 V123
> K166
>> 2 A0125 D200 |G1223|B0101|D300|
> C333
>> 4 B0122 D400 E002
> V777
>>
>> So everything between the double quotes are treated as one string. Is
>> there any way to get around this without editing the txt file?
>>
>>
> Hmm, that is problematic, and not quite what I'd expect, but I can see
> clearly why its happening. Stata sees the first double quote and
> assumes that it is encapsulating a string variable, and reads until it
> sees the next (closing) string variable, treating any pipes ("|") as
> part of the string.
>
> I'm not sure how to work around this in Stata I'm afraid. You may gain
> some mileage writing a custom dictionary and using -infile-.
>
> Personally I would make a system call to the common *NIX-like command
> 'sed' to search and replace any instances of double-quotes. This has
> the advantage of being automated as the system call can be placed in
> your do-file (as opposed to manually opening the file in your text
> editor and doing the search and replace). At the same time it has the
> disadvantage of not being handled internally in Stata, making it
> somewhat less platform neutral (would probably work fine on Linux and
> Macs, but you'd have to have some trickery to call sed under a Cygwin
> installation under Windows, I've done it in the past, but can't quote
> remember the finer details). There may be a similar command (or indeed
> native version of sed) under M$-windows Command Prompt, but I'm not
> aware of it.
>
> Another option would be to ask the people who sent you the data to
> choose an alternative character/symbol/number for missing data (quite
> why they chose double-quotes in the first place is a mystery only they
> can answer as it has the potential mess things up, as you've found ,by
> virtue of being the character used to encapsulate strings by many
> databases and software).
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
--
Ada Ma
Research Fellow
Health Economics Research Unit
University of Aberdeen, UK.
http://www.abdn.ac.uk/heru/
Tel: +44 (0) 1224 553863
Fax: +44 (0) 1224 550926
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/