Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Getting rid of binary codes so I can read in files - reposted
From
Austin Nichols <[email protected]>
To
[email protected]
Subject
Re: st: Getting rid of binary codes so I can read in files - reposted
Date
Wed, 18 Jan 2012 14:02:02 -0500
Orian Brook <[email protected]>:
Looks like two bugs in -filefilter- to me; \00h in the from() option
looks empty, as you will see if you try to write to a location where
file2 cannot be saved. Stata will tell you:
(from() option is empty, therefore whole operation is irrelevant;
input file will be copied to output file)
before telling you it cannot save file2. If you specify an empty
from() and to() option, Stata freezes up.
On Wed, Jan 18, 2012 at 9:40 AM, Orian Brook <[email protected]> wrote:
> Not lucky enough to have had any replies so far - is there anyone with any
> suggestions, or shall I just revert to Outlook?
> Thanks
> Orian
>
> Dear all
> I'm analysing administrative data which I've had to export using an online
> database into 105 files. I've previously worked with similar files by
> importing and combining them all in Outlook, then reading into stata using
> an odbc link, but I'd really like to try to do it all in stata (so I have
> the do file for repetition/audit trail purposes) but I have some problems.
> The original files has extra EOL characters, and extended ones, which I can
> get rid of using filefilter, but I still can't import the file: using
> insheet I get the correct number of rows and columns, but all cells are
> blank except the first (it has a t in it). I've also tried using infile and
> skipping the first line, to no avail. Running hexdump shows that I have over
> 2million binary 0s, which I think may be the problem? I tried using the
> command "filefilter file1 file2, from(\00hd) to() replace" to get rid of
> them, but it hangs.
>
> Any help would be very gratefully received. The hexdump is below.
> (apologies, plain text format doesn't allow me to post this in courier or
> something more legible)
>
> Regards
> Orian Brook
>
> Line-end characters Line length (tab=1)
> \r\n (Windows) 26,823 minimum 2
> \r by itself (Mac) 0 maximum 403
> \n by itself (Unix) 0
> Space/separator characters Number of lines 26,824
> [blank] 107,191 EOL at EOF? no
> [tab] 0
> [comma] (,) 509,637 Length of first 5 lines
> Control characters Line 1 403
> binary 0 2,747,580 Line 2 185
> CTL excl. \r, \n, \t 0 Line 3 243
> DEL 0 Line 4 245
> Extended (128-159,255) 0 Line 5 245
> ASCII printable
> A-Z 189,766
> a-z 189,754 File format BINARY
> 0-9 1,509,729
> Special (!@#$ etc.) 187,857
> Extended (160-254) 0
> ---------------
> Total 5,495,160
> Observed were:
> \0 \n \r blank , - . / 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N
> O
> P Q R S T U V W X Y Z _ a b c d e f g h i k l m n o p q r s t u v x y
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/