st: Does -insheet- read data incorrectly?

Dirk Enzmann
To   [email protected]
Subject   st: Does -insheet- read data incorrectly?
Thu, 12 Mar 2009 22:26:18 +0100

I encountered the following problem:

I'm using the following command to import the data of a tab-delimited text file into Stata:

insheet using "file.txt", tab clear

"file.txt" contains data delimited by tabs, the first row contains the following names of the variables (also separated by tabs):

recfile time LfdNr field note

Except for "LfdNr" all variables should be string variables.

In each row the "values" (better: "columns") are separated by four tabs. An example of the data of a row is as follows (to show how the data look like, in this mail I separate each "column" of the row by using a line break, in the data file they are separated by tabs, of course):

20 Dez 2008 15:43
.; #2-3

The problem: In some rows the last "column" (here containing ".; #2-3") contains double quotes ("), but sometimes they don't occur in pairs enclosing other characters but as lonesome singles. If this is the case, -insheet- does not start the new case with the new row of data but continues to read the data of the text-file into the variable "note". Only if again a single double quote occurs in a row of data, -insheet- continues to create new cases by reading new rows.

For example, if a row contains the following data (again, in this mail separated by line breaks instead of tabs to show clearly how the data look like):

13 Dez 2008 14:37
97; "#4-5

ignoring line breaks or tabs all data of the text file starting with "97;" #4-5" will be read into the variable "note" until another line of the text file contains a string with only one double quote, such as

15 Dez 2008 14:05
.; "2-3

(of course, the length of the string variable "note" will automatically be restricted to 244 and everything which exceeds this will be lost, but this is not the issue).

To my mind a tab-delimited file is a tab-delimited file, i.e. data wil be read as *separated* by tabs (and/or line-breaks). Obviously, -insheet- does not respect the tabs as delimiters in all instances.

Is this a correct behavior of -insheet- which I don't understand correctly or is it a bug? What should I do if it is the former?


