Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: insheet and dropping cases
From
"Ben Hoen" <[email protected]>
To
<[email protected]>
Subject
RE: st: insheet and dropping cases
Date
Thu, 20 Feb 2014 14:19:09 -0500
Hi Sergiy,
I am pasting in the tabulate from hexdump (not knowing how to provide a link
to those files as you suggest):
Tabulation (character not listed if unobserved):
Dec Hex Char Frequency
------------------------------
010 0a \n 364
013 0d \r 364
032 20 blank 9,621
033 21 ! 9
034 22 " 5
035 23 # 21
038 26 & 202
039 27 ' 135
040 28 ( 30
041 29 ) 29
042 2a * 4
043 2b + 7
044 2c , 112
045 2d - 3,378
046 2e . 282
047 2f / 337
048 30 0 157,131
049 31 1 18,056
050 32 2 13,187
051 33 3 8,837
052 34 4 8,087
053 35 5 6,803
054 36 6 7,456
055 37 7 6,283
056 38 8 6,322
057 39 9 6,333
058 3a : 98
059 3b ; 24
064 40 @ 1
065 41 A 5,418
066 42 B 1,197
067 43 C 3,167
068 44 D 2,399
069 45 E 5,718
070 46 F 1,067
071 47 G 1,612
072 48 H 1,597
073 49 I 4,112
074 4a J 300
075 4b K 873
076 4c L 4,877
077 4d M 1,693
078 4e N 4,099
079 4f O 4,254
080 50 P 1,343
081 51 Q 149
082 52 R 4,634
083 53 S 3,272
084 54 T 3,756
085 55 U 1,162
086 56 V 865
087 57 W 1,488
088 58 X 151
089 59 Y 1,369
090 5a Z 67
095 5f _ 726
097 61 a 817
098 62 b 73
099 63 c 147
100 64 d 323
101 65 e 887
102 66 f 65
103 67 g 199
104 68 h 189
105 69 i 498
107 6b k 233
108 6c l 419
109 6d m 111
110 6e n 616
111 6f o 872
112 70 p 107
113 71 q 4
114 72 r 581
115 73 s 252
116 74 t 390
117 75 u 132
118 76 v 172
119 77 w 74
120 78 x 13
121 79 y 90
122 7a z 7
124 7c | 72,436
125 7d } 35
------------------------------
Total 394,625
It is not clear to me what the problem characters - unprintable/special or
not - but I tried replacing the "}" character (and the comma previously) to
no avail.
Separately I think I isolated the fields that contain the problems. Is
there a way to ignore/remove individual fields in a txt file from within
Stata?
Thank you for your efforts in helping me with this issue.
Ben
Ben Hoen
LBNL
Office: 845-758-1896
Cell: 718-812-7589
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Sergiy Radyakin
Sent: Thursday, February 20, 2014 1:28 PM
To: [email protected]
Subject: Re: st: insheet and dropping cases
Ben,
-- the problem is likely caused by presence of unprintable characters
in the file, that are tolerated by StatTransfer, but not by Stata;
-- character with ASCII code 255 is a usual suspect;
-- pasting raw data to statalist is likely not to reveal the problem,
since the special characters might not survive massaging throw emails;
-- isolating the problem in the text editor into a new file could help
(keep the last record read in correctly and one immediately after),
then make the file available through a link, to retain its binary
structure, not all text editors will retain special chars on save;
-- use hexdump "file" , analyze tabulate to see unprintable
characters, then search for them in the file or use filefilter;
-- see "zap gremlins" for relevant tactic.
On the bright side: you are lucky you have 363 cases. Last time I had
this problem, only 16gb out of 40gb were read in. Try to open that
file in the notepad :)
Hope this helps.
Best, Sergiy Radyakin
On Thu, Feb 20, 2014 at 12:34 PM, Radwin, David <[email protected]> wrote:
> One other possibility is to use -inputst-, a Stata program that calls
Stat/Transfer (part of -stcmd- by Roger Newson and available at SSC).
>
> This workaround is probably less computationally efficient than the
suggestions from others, but since you already know that Stat/Transfer
works, this approach might be faster and easier than trying to figure out
the problem with your text files and -insheet- or -import delimited-.
>
> David
> --
> David Radwin, Senior Research Associate
> Education and Workforce Development
> RTI International
> 2150 Shattuck Ave. Suite 800, Berkeley, CA 94704
> Phone: 510-665-8274
>
> www.rti.org/education
>
>
>> -----Original Message-----
>> From: [email protected] [mailto:owner-
>> [email protected]] On Behalf Of Phil Schumm
>> Sent: Thursday, February 20, 2014 6:38 AM
>> To: Statalist Statalist
>> Subject: Re: st: insheet and dropping cases
>>
>> On Feb 20, 2014, at 8:28 AM, Ben Hoen <[email protected]> wrote:
>> > Hexdump I had never used. This is what it returned:
>>
>> <snip>
>>
>> > Do you see anything suspicious here? (I replaced all the commas with
>> "_", using filefilter - another great suggestion - wondering if that was
>> causing any issues and insheet still returned 184 observations.)
>>
>>
>> I don't see anything obvious -- you'll need to look at the file directly.
>> Is Stata reading the first 184 observations, or are the 184 observations
>> from different places in the file? Check that first, and if you are
>> getting the first 184 observations, then look at lines 184-6 (depending
on
>> whether the file has a header line). Something has to be going on there.
>>
>>
>> -- Phil
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/