Ada Ma wrote:
> Thanks for the reply. Here is an example I have created which is
> close to what happened. The data should look like this:
>
> epikey hrg code1 code2 code3
> 1 A0123 D100 V123 K166
> 2 A0125 D200 " G122
> 3 B0101 D300 " C333
> 4 B0122 D400 E002 V777
>
> It is pipe delimited so in the text file it looks like this:
>
> epikey|hrg|code1|code2|code3
> 1|A0123|D100|V123|K166
> 2|A0125|D200|"|G122
> 3|B0101|D300|"|C333
> 4|B0122|D400|E002|V777
>
> When I specified the command as you stated above, i.e. specifying the
> delim("|") option, Stata reads in this:
>
> epikey hrg code1 code2 code3
> 1 A0123 D100 V123 K166
> 2 A0125 D200 |G1223|B0101|D300| C333
> 4 B0122 D400 E002 V777
>
> So everything between the double quotes are treated as one string. Is
> there any way to get around this without editing the txt file?
>
>
Hmm, that is problematic, and not quite what I'd expect, but I can see
clearly why its happening. Stata sees the first double quote and
assumes that it is encapsulating a string variable, and reads until it
sees the next (closing) string variable, treating any pipes ("|") as
part of the string.
I'm not sure how to work around this in Stata I'm afraid. You may gain
some mileage writing a custom dictionary and using -infile-.
Personally I would make a system call to the common *NIX-like command
'sed' to search and replace any instances of double-quotes. This has
the advantage of being automated as the system call can be placed in
your do-file (as opposed to manually opening the file in your text
editor and doing the search and replace). At the same time it has the
disadvantage of not being handled internally in Stata, making it
somewhat less platform neutral (would probably work fine on Linux and
Macs, but you'd have to have some trickery to call sed under a Cygwin
installation under Windows, I've done it in the past, but can't quote
remember the finer details). There may be a similar command (or indeed
native version of sed) under M$-windows Command Prompt, but I'm not
aware of it.
Another option would be to ask the people who sent you the data to
choose an alternative character/symbol/number for missing data (quite
why they chose double-quotes in the first place is a mystery only they
can answer as it has the potential mess things up, as you've found ,by
virtue of being the character used to encapsulate strings by many
databases and software).
Sorry I can't offer any more advise.
Neil
--
"We should make things as simple as possible, but not simpler" - Anon (not Albert Einstein)
***********************************************************************
This message may contain confidential and privileged information.
If you are not the intended recipient you should not disclose, copy
or distribute information in this e-mail or take any action in reliance
on its contents. To do so is strictly prohibited and may be unlawful.
Please inform the sender that this message has gone astray before
deleting it. Thank you.
2008 marks the 60th anniversary of the NHS. It's an opportunity to pay
tribute to the NHS staff and volunteers who help shape the service, and
celebrate their achievements.
If you work for the NHS and would like an NHSmail email account, go
to: www.connectingforhealth.nhs.uk/nhsmail
***********************************************************************
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/