Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Stripping ASCII characters
From
Ronan Conroy <[email protected]>
To
"<[email protected]>" <[email protected]>
Subject
Re: st: Stripping ASCII characters
Date
Tue, 25 Feb 2014 13:52:57 +0000
Prof. Ronan Conroy
Associate Professor of Biostatistics
RCSI Department of Epidemiology and Public Health Medicine
Royal College of Surgeons in Ireland
Lower Mercer Street, Dublin 2, Ireland
T: 01-402-2431
E: [email protected] W: www.rcsi.ie
RCSI DEVELOPING HEALTHCARE LEADERS
WHO MAKE A DIFFERENCE WORLDWIDE
On 2014 Feabh 24, at 21:03, Thomas, Anthony wrote:
> When insheeting a csv file using Stata 11 - Unix, Stata aborts with the error:
>
> too many variables specified
> error in line 5000000 of file
>
> Output of "hexdump" indicated the file contained control characters
> (^Z), and was in binary format, when it was expected to be ASCII. I
> tried using "filefilter "f1.csv" "f2.csv", from(^Z) to() replace" to
> strip the problem characters, but a hexdump on f2.csv indicates the
> (^Z) are still present. From what I understand ^Z (sub) is used in
> place of a character that cannot be read by Stata, is this the case?
> If so, is there any way to strip these characters from my file prior
> to import?
This is the place where a good text editor comes in handy. Many have a 'strip non-ASCII' command that does what you want.
I ended up with 4,500 text files of which about 10% were corrupted. BBEdit (free, lite version=TextWrangler) processed the whole lot in a second or two!
r
Ronán Conroy
[email protected]
Associate Professor
Division of Population Health Sciences
Royal College of Surgeons in Ireland
Beaux Lane House
Dublin 2
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/