I don't know if this can be considered a bug. Quotes have higher priority
than the delimiter as it seems. You could replace them with -filfilter-.
This is maybe not the most efficient way but it keeps the data
structured.I was not able to keep the quotes - so I first replace them
with a particular string and use the string function -subiunstr-
thereafter. When I tried that with -filefilter- in one step, I lost the
quotes - don't know why.
***********************
filefilter file.txt file_cleaned.txt, from(\Q) to(SINGLEQUOTE)
insheet using file_cleaned.txt, clear
replace v5 = subinstr(v5,"SINGLEQUOTE",`"""',.)
***********************
HTH,
Johannes
----------------------
Johannes Geyer
Deutsches Institut für Wirtschaftsforschung (DIW Berlin)
German Institute for Economic Research
Department of Public Economics
DIW Berlin
Mohrenstraße 58
10117 Berlin
Tel: +49-30-89789-258
[email protected] schrieb am 12/03/2009 22:26:18:
> I encountered the following problem:
>
> I'm using the following command to import the data of a tab-delimited
> text file into Stata:
>
> --------------------------------------------------------------------
> insheet using "file.txt", tab clear
> --------------------------------------------------------------------
>
> "file.txt" contains data delimited by tabs, the first row contains the
> following names of the variables (also separated by tabs):
>
> --------------------------------------------------------------------
> recfile time LfdNr field note
> --------------------------------------------------------------------
>
> Except for "LfdNr" all variables should be string variables.
>
> In each row the "values" (better: "columns") are separated by four tabs.
> An example of the data of a row is as follows (to show how the data look
> like, in this mail I separate each "column" of the row by using a line
> break, in the data file they are separated by tabs, of course):
>
> --------------------------------------------------------------------
> D:\DATENEINGABE\HH08\HH08_SF9_05.REC
> 20 Dez 2008 15:43
> 570
> vermnb
> .; #2-3
> --------------------------------------------------------------------
>
> The problem: In some rows the last "column" (here containing ".; #2-3")
> contains double quotes ("), but sometimes they don't occur in pairs
> enclosing other characters but as lonesome singles. If this is the case,
> -insheet- does not start the new case with the new row of data but
> continues to read the data of the text-file into the variable "note".
> Only if again a single double quote occurs in a row of data, -insheet-
> continues to create new cases by reading new rows.
>
> For example, if a row contains the following data (again, in this mail
> separated by line breaks instead of tabs to show clearly how the data
> look like):
>
> --------------------------------------------------------------------
> D:\DATENEINGABE\HH08\HH08_SF9_05.REC
> 13 Dez 2008 14:37
> 325
> glaeubig
> 97; "#4-5
> --------------------------------------------------------------------
>
> ignoring line breaks or tabs all data of the text file starting with
> "97;" #4-5" will be read into the variable "note" until another line of
> the text file contains a string with only one double quote, such as
>
> --------------------------------------------------------------------
> D:\DATENEINGABE\HH08\HH08_SF9_05.REC
> 15 Dez 2008 14:05
> 373
> beten
> .; "2-3
> --------------------------------------------------------------------
>
> (of course, the length of the string variable "note" will automatically
> be restricted to 244 and everything which exceeds this will be lost, but
> this is not the issue).
>
> To my mind a tab-delimited file is a tab-delimited file, i.e. data wil
> be read as *separated* by tabs (and/or line-breaks). Obviously,
> -insheet- does not respect the tabs as delimiters in all instances.
>
> Is this a correct behavior of -insheet- which I don't understand
> correctly or is it a bug? What should I do if it is the former?
>
> Yours,
> Dirk
>
> *************************************************
> Dr. Dirk Enzmann
> Institute of Criminal Sciences
> Dept. of Criminology
> Schlueterstr. 28
> D-20146 Hamburg
> Germany
>
> phone: +49-(0)40-42838.7498 (office)
> +49-(0)40-42838.4591 (Mrs Billon)
> fax: +49-(0)40-42838.2344
> email: [email protected]
> www:
> http://www2.jura.uni-hamburg.
> de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html
> *************************************************
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/