I work with tab-delimited text files with string variables that
sometimes contain quote marks. If the quotes appear in pairs, the data
is imported but the quotes are stripped from the data. When a string
contains a single quote mark (i.e., a quote mark not followed by a
second quote mark), Stata fills that particular variable up to the
maximum string length of 244 characters and then stops the import so
that all remaining data from the original file is ignored. The problem
can be reproduced with these three test files:
test1.txt:
row11 row12 row13
row21 row22 row23
row31 row32 row33
test2.txt:
row11 row12 row13
row21 "row"22 row23
row31 row32 row33
test3.txt:
row11 row12 row13
row21 row"22 row23
row31 row32 row33
Each file has three lines of text, and each line has three strings
that are separated by tabs. test1.txt is a tab-delimited text files
without quotes; this file can be imported without problems. test2.txt
is a tab-delimited text files with a pair of quotes; the file is
imported but the quotes are removed. test3.txt has a single quote mark
and -insheet- fails. One of my text files has 95,000 lines and only
the first 918 lines are imported because of a single quote mark in
line 918.
. insheet using test1.txt, clear tab nonames
(3 vars, 3 obs)
. clist
v1 v2 v3
1. row11 row12 row13
2. row21 row22 row23
3. row31 row32 row33
. insheet using test2.txt, clear tab nonames
(3 vars, 3 obs)
. clist
v1 v2 v3
1. row11 row12 row13
2. row21 row22 row23
3. row31 row32 row33
. insheet using test3.txt, clear tab nonames
(3 vars, 2 obs)
. clist
v1 v2 v3
1. row11 row12 row13
2. row21 22 row23 row31 row32 row33
How can the data be imported into Stata with all observations and
preferably also with quotes, either single or in pairs? I can open the
files in a text editor and look for quotes that do not appear in pairs
to remove them manually, but this is inefficient and changes the
original data.
Thanks,
Friedrich
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/