Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Getting rid of binary codes so I can read in files
From
"Orian Brook" <[email protected]>
To
<[email protected]>
Subject
st: Getting rid of binary codes so I can read in files
Date
Fri, 13 Jan 2012 13:16:15 -0000
Dear all
I'm analysing administrative data which I've had to export using an online
database into 105 files. I've previously worked with similar files by
importing and combining them all in Outlook, then reading into stata using
an odbc link, but I'd really like to try to do it all in stata (so I have
the do file for repetition/audit trail purposes) but I have some problems.
The original files has extra EOL characters, and extended ones, which I can
get rid of using filefilter, but I still can't import the file: using
insheet I get the correct number of rows and columns, but all cells are
blank except the first (it has a t in it). I've also tried using infile and
skipping the first line, to no avail. Running hexdump shows that I have over
2million binary 0s, which I think may be the problem? I tried using the
command "filefilter file1 file2, from(\00hd) to() replace" to get rid of
them, but it hangs.
Any help would be very gratefully received. The hexdump is below.
(apologies, plain text format doesn't allow me to post this in courier or
something more legible)
Regards
Orian Brook
Line-end characters Line length (tab=1)
\r\n (Windows) 26,823 minimum 2
\r by itself (Mac) 0 maximum 403
\n by itself (Unix) 0
Space/separator characters Number of lines 26,824
[blank] 107,191 EOL at EOF? no
[tab] 0
[comma] (,) 509,637 Length of first 5 lines
Control characters Line 1 403
binary 0 2,747,580 Line 2 185
CTL excl. \r, \n, \t 0 Line 3 243
DEL 0 Line 4 245
Extended (128-159,255) 0 Line 5 245
ASCII printable
A-Z 189,766
a-z 189,754 File format BINARY
0-9 1,509,729
Special (!@#$ etc.) 187,857
Extended (160-254) 0
---------------
Total 5,495,160
Observed were:
\0 \n \r blank , - . / 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N
O
P Q R S T U V W X Y Z _ a b c d e f g h i k l m n o p q r s t u v x y
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/