Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Sergiy Radyakin <[email protected]> |
To | "[email protected]" <[email protected]> |
Subject | Re: st: Stripping ASCII characters |
Date | Tue, 25 Feb 2014 11:40:45 -0500 |
Dear Anthony, StataCorp designed -filefilter- command to work with both text and binary files (as the manual declares). http://www.stata.com/help.cgi?filefilter It must work with any characters in file as long as you can describe it in your do-code. I do not confirm the eof defect in -filefilter- that you imply. A test is here: do http://radyakin.org/statalist/2014/20140225_1130_filefilter_probe.do It seems that you forgot the "d" letter in your code, and hence the filefilter is doing not what you expect. If you still insist that there is a defect in -filefilter-, kindly share the isolated sequence of data (a few bytes before and after the character not replaced). Processing file byte-by-byte is nevertheless a useful technique in cases which -filefilter- does not handle, e.g. replacing sequence only after a particular other sequence (semaphore) was encountered in the file. Hope this helps. Best, Sergiy Radyakin On Tue, Feb 25, 2014 at 10:55 AM, Thomas, Anthony <[email protected]> wrote: > Hi Ronan and Sergiy, > > I'm not sure if my response yesterday made it through to the list, I > got a bounce notification this morning. In any event, thanks for the > suggestions. Sergiy: perhaps I am not using filefilter correctly, I > tried the following: > > filefilter "f1.csv" "f2.csv", from(026) to() replace // 026 is ^Z's hex code > > filefilter "f1.csv" "f2.csv", from(\255d) to() replace > > and > > filefilter "f1.csv" "f2.csv", from(^Z) to() replace // which I didn't > really expect to work > > In all three cases, the number of control characters in hexdum f1.csv > == number of control characters in hexdump f2.csv. I'll give reading > the file byte-by-byte a try though. And Ronan, thanks for the > suggestion, I tried using "sed" (a command line text streaming > utility) which removed some of the "^Z" but not all. > > Thanks, > > Anthony > > On Tue, Feb 25, 2014 at 8:52 AM, Ronan Conroy <[email protected]> wrote: >> >> Prof. Ronan Conroy >> Associate Professor of Biostatistics >> >> >> RCSI Department of Epidemiology and Public Health Medicine >> Royal College of Surgeons in Ireland >> Lower Mercer Street, Dublin 2, Ireland >> T: 01-402-2431 >> E: [email protected] W: www.rcsi.ie >> >> RCSI DEVELOPING HEALTHCARE LEADERS >> WHO MAKE A DIFFERENCE WORLDWIDE >> On 2014 Feabh 24, at 21:03, Thomas, Anthony wrote: >> >>> When insheeting a csv file using Stata 11 - Unix, Stata aborts with the error: >>> >>> too many variables specified >>> error in line 5000000 of file >>> >>> Output of "hexdump" indicated the file contained control characters >>> (^Z), and was in binary format, when it was expected to be ASCII. I >>> tried using "filefilter "f1.csv" "f2.csv", from(^Z) to() replace" to >>> strip the problem characters, but a hexdump on f2.csv indicates the >>> (^Z) are still present. From what I understand ^Z (sub) is used in >>> place of a character that cannot be read by Stata, is this the case? >>> If so, is there any way to strip these characters from my file prior >>> to import? >> >> This is the place where a good text editor comes in handy. Many have a 'strip non-ASCII' command that does what you want. >> >> I ended up with 4,500 text files of which about 10% were corrupted. BBEdit (free, lite version=TextWrangler) processed the whole lot in a second or two! >> >> r >> >> Ronán Conroy >> [email protected] >> Associate Professor >> Division of Population Health Sciences >> Royal College of Surgeons in Ireland >> Beaux Lane House >> Dublin 2 >> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/