If the file has multiple spaces between the variables, but the same multiple
spaces in each record (i.e. same format for each record) then you should be
able to input it (at least into SPSS, I don't know about Stata) using the
old Fortran type input command. E.g. 3F2.1 6x F3.0 (means 3 2digit (or
string length 2) variables (2 spaces) then skip 6 spaces and read a 3 digit
variable. I know it is something like this but it has been a while since I
have done it.
Don Spady
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Joseph Coveney
Sent: Thursday, August 18, 2005 10:11 PM
To: Statalist
Subject: Re: st: read text file with multiple spaces
Yu Zhang wrote:
It's a shame to ask, but does anyone know how to read
data (text file) with multiple spaces between
variables? The number of spaces may vary, so I cannot
use:
. insheet using file, delim(" ")
The only way I figured out is to count the number of
variables first (e.g., using Perl) and then use:
. infile var1-var# using file
Is there a more direct way?
----------------------------------------------------------------------------
----
My guess would be to do the same in Stata as you would do in Perl to
identify variables.
For example, if there is only a single space between tokens within any
string
variable, and there are at least two spaces (maybe more) between each pair
of variables, then:
1. insheet into Stata into a single string variable (mind the limit for
string variable length),
2. use Stata's limited regular expressions capability to convert multiple
spaces to a convenient delimiter (choose one not otherwise present in the
string variables' data),
3. convert multiple delimiters to single delimiters (mind blank cells),
4. export the delimited dataset as an ASCII spreadsheet from Stata (using
the -no quote- option) to a temporary file, and then
5. re-import the delimited spreadsheet into Stata.
Joseph Coveney
* Creating demonstration spreadsheet
clear
set more off
set obs 3
generate str var1 = "column1 column2 column3"
replace var1 = ///
"This is the first column. This is the second column. " ///
+ "This is the third column." in 2
replace var1 = ///
"The first-second is two spaces. " ///
+ "The second-third is four spaces. " in 3
* Check these last lines above--they might have line-wrapped
* in the e-mail handler.
outsheet using space_delimited_text_spreadsheet.prn, noname noquote
clear
*
* Begin here
*
insheet using space_delimited_text_spreadsheet.prn
replace v1 = subinstr(v1, " ", "; ", .)
replace v1 = subinstr(v1, "; ; ", "; ", .)
tempfile tmpfil0
outsheet using `tmpfil0', nonames noquote
insheet using `tmpfil0', names delimiter(";") clear
erase `tmpfil0'
list, clean
exit
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/