Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Error in chunky
From
"King, Carina" <[email protected]>
To
"[email protected]" <[email protected]>
Subject
st: Error in chunky
Date
Mon, 28 Nov 2011 14:56:11 +0000
Dear Statalist,
I am having some issues with a very large file (about 8GB). I am using 'chunky' to attempt to split it into smaller files but it keeps coming back with an error:
ftell(): 2144826048 Stata returned error
chunkfile(): - function returned error
<istmt>: - function returned error [1]
r(2144826048);
I have tried setting the chunked file size to different sizes, starting at 1G going down to 10M and the error comes up each time but at a different position. I have also tried it with a .txt file and a .csv file and it again comes up with the same error in both. I have put below the 'analyze' results from the chunky programme and the memory allocation that my STATA is set to. Any help on what the error is or how to solve it / suggestions on how to open this file would be much appreciated!
Analyzing D:\Carina\New NEW dat files\hashed09_new_csv\hashed09_new_csv.csv for chunking
BINARY is the file type
File has 6352205 lines of average length 1282 bytes
Composition is 11% letters, 56.00000000000001% numbers and 34% other characters
No extended characters present.
Approximate chunk sizes and memory requirements
for -insheet- or -infile- commands
+-----------------------------------------------------------+
|Chunksize (mb)| Number of | ~Number | Stata size* |
| option | Chunks | obs/chunk | (megabytes) |
|--------------+--------------+--------------+--------------|
| 10 | 815 | 7794 | 5.9 |
| 30 | 272 | 23354 | 17.7 |
| 100 | 82 | 77466 | 58.7 |
| 300 | 28 | 226864 | 171.8 |
| 1000 | 9 | 705801 | 534.6 |
| 3000 | 3 | 2117402 | 1603.7 |
+-----------------------------------------------------------+
* Stata file size is very approximate and depends on datatypes of variables
. hexdump `"D:\Carina\New NEW dat files\hashed09_new_csv\hashed09_new_csv.csv"', analyze results
Line-end characters Line length (tab=1)
\r\n (Windows) 6,352,205 minimum 568
\r by itself (Mac) 0 maximum 2,631
\n by itself (Unix) 0
Space/separator characters Number of lines 6,352,205
[blank] 608,462,578 EOL at EOF? yes
[tab] 0
[comma] (,) 1,255,193,981 Length of first 5 lines
Control characters Line 1 2,631
binary 0 12 Line 2 1,058
CTL excl. \r, \n, \t 1 Line 3 1,053
DEL 0 Line 4 1,042
Extended (128-159,255) 0 Line 5 1,052
ASCII printable
A-Z 647,742,983
a-z 219,580,832 File format BINARY
0-9 4,546,540,695
Special (!@#$ etc.) 854,705,763
Extended (160-254) 0
---------------
Total 8,144,931,255
Observed were:
\0 ^C \n \r blank " % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = >
? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ ] ^ _ a b c d
e f g h i j k l m n o p q r s t u v w x y z { } ~
Current memory allocation
current memory usage
settable value description (1M = 1024k)
--------------------------------------------------------------------
set maxvar 5000 max. variables allowed 1.947M
set memory 10000M max. data space 10,000.000M
set matsize 11000 max. RHS vars in models 924.080M
-----------
10,926.027M
Thanks!
Carina King
PhD Student
The National Centre for Infection Prevention and Management,
Imperial College,
St Dunstans Road,
London
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/