How can significant unimbedded spaces of string variables be retained when
importing fixed-width ASCII files?
-infile- with a dictionary truncates both the leading and trailing spaces of
string variables. An example of when this is inappropriate is illustrated
below. The example is a toy of a real data-management task in which
long-string variables are read into Stata (in pieces that are 244 characters
or less) from fixed-width ASCII files, manipulated as needed using Stata,
and then exported to an ODBC-compliant application where they are
reassembled by concatenation using SQL.
-infix- does the same truncation.
In the past, I've used -filefilter- to substitute char(160) as a
place-holder (space-holder), imported the substituted file, and then
used -subinstr()- to back-out the substitutions. This workaround is also
illustrated below. I can't help thinking that there must be a much more
straightforward approach that I'm overlooking. (Given the larger task at
hand, employing -file- or Mata's -cat()- would seem at least as convoluted a
workaround as the character-substitution one.)
The user manual implies that at least leading blanks will always be skipped
by -infile-. It seems as if this problem could have come up before on the
list, but a search on the keywords "spaces" and "fixed width" didn't turn
anything up.
Joseph Coveney
---------------begin Mary.dat-------------------
1234567890123456789012345678901234567
Mary had a little lamb. Its 1
fleece was white as snow, and 2
everywhere that Mary went, the 3
lamb was sure to go. 4
----------------end Mary.dat-------------------
----------------begin Mary.dct-----------------
infile dictionary using Mary.dat {
_firstlineoffile(2)
str5 a_01 %5s
str5 a_02 %5s
str5 a_03 %5s
str5 a_04 %5s
str5 a_05 %5s
str5 a_06 %5s
str5 a_07 %5s "I'm all blank; -drop- me"
str2 line %2s "I'm a number; -destring- me"
}
------------------end Mary.dct-----------------
---------------begin Mary.do-------------------
* Doesn't work
quietly infile using Mary.dct, clear
generate str A = a_01 + a_02 + a_03 + a_04 + a_05 + a_06
list A, noobs separator(0)
* Workaround
tempfile tmpfil0
filefilter Mary.dat "`tmpfil0'", from(" ") to(\160d)
quietly infile using Mary.dct, using("`tmpfil0'") clear
foreach var of varlist _all {
capture assert indexnot(`var', char(160)) == 0
if !_rc {
drop `var'
continue
}
capture assert strpos(`var', char(160)) == 0
local has_it = _rc
while `has_it' {
quietly replace `var' = subinstr(`var', char(160), " ", .)
capture assert strpos(`var', char(160)) == 0
local has_it = _rc
}
quietly destring `var', replace
}
generate str A = a_01 + a_02 + a_03 + a_04 + a_05 + a_06
list A, noobs separator(0)
exit
--------------end Mary.do-------------------------
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/