I encountered the following problem:
I'm using the following command to import the data of a tab-delimited
text file into Stata:
--------------------------------------------------------------------
insheet using "file.txt", tab clear
--------------------------------------------------------------------
"file.txt" contains data delimited by tabs, the first row contains the
following names of the variables (also separated by tabs):
--------------------------------------------------------------------
recfile time LfdNr field note
--------------------------------------------------------------------
Except for "LfdNr" all variables should be string variables.
In each row the "values" (better: "columns") are separated by four tabs.
An example of the data of a row is as follows (to show how the data look
like, in this mail I separate each "column" of the row by using a line
break, in the data file they are separated by tabs, of course):
--------------------------------------------------------------------
D:\DATENEINGABE\HH08\HH08_SF9_05.REC
20 Dez 2008 15:43
570
vermnb
.; #2-3
--------------------------------------------------------------------
The problem: In some rows the last "column" (here containing ".; #2-3")
contains double quotes ("), but sometimes they don't occur in pairs
enclosing other characters but as lonesome singles. If this is the case,
-insheet- does not start the new case with the new row of data but
continues to read the data of the text-file into the variable "note".
Only if again a single double quote occurs in a row of data, -insheet-
continues to create new cases by reading new rows.
For example, if a row contains the following data (again, in this mail
separated by line breaks instead of tabs to show clearly how the data
look like):
--------------------------------------------------------------------
D:\DATENEINGABE\HH08\HH08_SF9_05.REC
13 Dez 2008 14:37
325
glaeubig
97; "#4-5
--------------------------------------------------------------------
ignoring line breaks or tabs all data of the text file starting with
"97;" #4-5" will be read into the variable "note" until another line of
the text file contains a string with only one double quote, such as
--------------------------------------------------------------------
D:\DATENEINGABE\HH08\HH08_SF9_05.REC
15 Dez 2008 14:05
373
beten
.; "2-3
--------------------------------------------------------------------
(of course, the length of the string variable "note" will automatically
be restricted to 244 and everything which exceeds this will be lost, but
this is not the issue).
To my mind a tab-delimited file is a tab-delimited file, i.e. data wil
be read as *separated* by tabs (and/or line-breaks). Obviously,
-insheet- does not respect the tabs as delimiters in all instances.
Is this a correct behavior of -insheet- which I don't understand
correctly or is it a bug? What should I do if it is the former?
Yours,
Dirk
*************************************************
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Schlueterstr. 28
D-20146 Hamburg
Germany
phone: +49-(0)40-42838.7498 (office)
+49-(0)40-42838.4591 (Mrs Billon)
fax: +49-(0)40-42838.2344
email: [email protected]
www:
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html
*************************************************
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/