Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Error in chunky

From	"King, Carina" <[email protected]>
To	"[email protected]" <[email protected]>
Subject	st: Error in chunky
Date	Mon, 28 Nov 2011 14:56:11 +0000

Dear Statalist,

I am having some issues with a very large file (about 8GB). I am using 'chunky' to attempt to split it into smaller files but it keeps coming back with an error:   

               ftell(): 2144826048  Stata returned error
             chunkfile():     -  function returned error
                 <istmt>:     -  function returned error [1]
r(2144826048);


I have tried setting the chunked file size to different sizes, starting at 1G going down to 10M and the error comes up each time but at a different position. I have also tried it with a .txt file and a .csv file and it again comes up with the same error in both. I have put below the 'analyze' results from the chunky programme and the memory allocation that my STATA is set to. Any help on what the error is or how to solve it / suggestions on how to open this file would be much appreciated! 

Analyzing D:\Carina\New NEW dat files\hashed09_new_csv\hashed09_new_csv.csv for chunking

BINARY is the file type
File has 6352205 lines of average length 1282 bytes
Composition is 11% letters, 56.00000000000001% numbers and 34% other characters
No extended characters present.

Approximate chunk sizes and memory requirements 
for -insheet- or -infile- commands
+-----------------------------------------------------------+
|Chunksize (mb)|  Number of   |   ~Number    | Stata size*  |
|    option    |    Chunks    |  obs/chunk   | (megabytes)  |
|--------------+--------------+--------------+--------------|
|          10  |         815  |        7794  |         5.9  |
|          30  |         272  |       23354  |        17.7  |
|         100  |          82  |       77466  |        58.7  |
|         300  |          28  |      226864  |       171.8  |
|        1000  |           9  |      705801  |       534.6  |
|        3000  |           3  |     2117402  |      1603.7  |
+-----------------------------------------------------------+
* Stata file size is very approximate and depends on datatypes of variables


. hexdump `"D:\Carina\New NEW dat files\hashed09_new_csv\hashed09_new_csv.csv"', analyze results

  Line-end characters                        Line length (tab=1)
    \r\n         (Windows)      6,352,205      minimum                      568
    \r by itself (Mac)                  0      maximum                    2,631
    \n by itself (Unix)                 0
  Space/separator characters                 Number of lines          6,352,205
    [blank]                   608,462,578      EOL at EOF?                  yes
    [tab]                               0
    [comma] (,)             1,255,193,981    Length of first 5 lines
  Control characters                           Line 1                     2,631
    binary 0                           12      Line 2                     1,058
    CTL excl. \r, \n, \t                1      Line 3                     1,053
    DEL                                 0      Line 4                     1,042
    Extended (128-159,255)              0      Line 5                     1,052
  ASCII printable
    A-Z                       647,742,983
    a-z                       219,580,832    File format                 BINARY
    0-9                     4,546,540,695
    Special (!@#$ etc.)       854,705,763
    Extended (160-254)                  0
                          ---------------
  Total                     8,144,931,255

  Observed were:
     \0 ^C \n \r blank " % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = >
     ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ ] ^ _ a b c d
     e f g h i j k l m n o p q r s t u v w x y z { } ~



Current memory allocation

                    current                                 memory usage
    settable          value     description                 (1M = 1024k)
    --------------------------------------------------------------------
    set maxvar         5000     max. variables allowed           1.947M
    set memory        10000M    max. data space             10,000.000M
    set matsize       11000     max. RHS vars in models        924.080M
                                                            -----------
                                                            10,926.027M
Thanks!

Carina King
PhD Student

The National Centre for Infection Prevention and Management,
Imperial College,
St Dunstans Road, 
London


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Error in chunky
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: Issues with xtabond2
Next by Date: Re: st: nested logit tree
Previous by thread: st: Missing Wald test with -cluster- option
Next by thread: Re: st: Error in chunky
Index(es):
- Date
- Thread