Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Error in chunky
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Error in chunky
Date
Mon, 28 Nov 2011 19:05:48 +0000
No solution, just a suggestion to explain that -chunky- is from SSC.
Also, another suggestion: better to say "error in using -chunky-"
rather than "error in -chunky-", which leaves open the possibility
that the problem lies in the file or in the user's syntax, not
necessarily with the programmer.
Nick
On Mon, Nov 28, 2011 at 2:56 PM, King, Carina <[email protected]> wrote:
> Dear Statalist,
>
> I am having some issues with a very large file (about 8GB). I am using 'chunky' to attempt to split it into smaller files but it keeps coming back with an error:
>
> ftell(): 2144826048 Stata returned error
> chunkfile(): - function returned error
> <istmt>: - function returned error [1]
> r(2144826048);
>
>
> I have tried setting the chunked file size to different sizes, starting at 1G going down to 10M and the error comes up each time but at a different position. I have also tried it with a .txt file and a .csv file and it again comes up with the same error in both. I have put below the 'analyze' results from the chunky programme and the memory allocation that my STATA is set to. Any help on what the error is or how to solve it / suggestions on how to open this file would be much appreciated!
>
> Analyzing D:\Carina\New NEW dat files\hashed09_new_csv\hashed09_new_csv.csv for chunking
>
> BINARY is the file type
> File has 6352205 lines of average length 1282 bytes
> Composition is 11% letters, 56.00000000000001% numbers and 34% other characters
> No extended characters present.
>
> Approximate chunk sizes and memory requirements
> for -insheet- or -infile- commands
> +-----------------------------------------------------------+
> |Chunksize (mb)| Number of | ~Number | Stata size* |
> | option | Chunks | obs/chunk | (megabytes) |
> |--------------+--------------+--------------+--------------|
> | 10 | 815 | 7794 | 5.9 |
> | 30 | 272 | 23354 | 17.7 |
> | 100 | 82 | 77466 | 58.7 |
> | 300 | 28 | 226864 | 171.8 |
> | 1000 | 9 | 705801 | 534.6 |
> | 3000 | 3 | 2117402 | 1603.7 |
> +-----------------------------------------------------------+
> * Stata file size is very approximate and depends on datatypes of variables
>
>
> . hexdump `"D:\Carina\New NEW dat files\hashed09_new_csv\hashed09_new_csv.csv"', analyze results
>
> Line-end characters Line length (tab=1)
> \r\n (Windows) 6,352,205 minimum 568
> \r by itself (Mac) 0 maximum 2,631
> \n by itself (Unix) 0
> Space/separator characters Number of lines 6,352,205
> [blank] 608,462,578 EOL at EOF? yes
> [tab] 0
> [comma] (,) 1,255,193,981 Length of first 5 lines
> Control characters Line 1 2,631
> binary 0 12 Line 2 1,058
> CTL excl. \r, \n, \t 1 Line 3 1,053
> DEL 0 Line 4 1,042
> Extended (128-159,255) 0 Line 5 1,052
> ASCII printable
> A-Z 647,742,983
> a-z 219,580,832 File format BINARY
> 0-9 4,546,540,695
> Special (!@#$ etc.) 854,705,763
> Extended (160-254) 0
> ---------------
> Total 8,144,931,255
>
> Observed were:
> \0 ^C \n \r blank " % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = >
> ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ ] ^ _ a b c d
> e f g h i j k l m n o p q r s t u v w x y z { } ~
>
>
>
> Current memory allocation
>
> current memory usage
> settable value description (1M = 1024k)
> --------------------------------------------------------------------
> set maxvar 5000 max. variables allowed 1.947M
> set memory 10000M max. data space 10,000.000M
> set matsize 11000 max. RHS vars in models 924.080M
> -----------
> 10,926.027M
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/