Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: New version of file-chunking utility -chunky- is available from SSC
From
David Elliott <[email protected]>
To
[email protected]
Subject
st: New version of file-chunking utility -chunky- is available from SSC
Date
Wed, 1 Sep 2010 16:27:28 -0300
Thanks to Kit Baum, a new version of my file-chunking utility -chunky-
is available from SSC. The previous version is still available but has
been deprecated as -chunky8-
-chunky- has a completely new syntax and if you have used it
previously you will have to rewrite your routine. However, it will
achieve in a single command what previously required a loop since the
looping logic is now built into the routine. The use of new logic and
Mata subroutines has resulted in up to several orders of magnitude
speed increase on chunking larger files depending on hardware and
network configurations. -chunky- can handle automatically naming the
chunk files with a user specified stub and provision is made for
handling the header line present in many test output formats. New
pre-chunking file analysis options are available to examine file
structure and help anticipate any infiling problems.
Known issues:
The routine fails on very wide (>32k) input lines (32,768 character
limit of Mata fget())
There are errors in the help file in the notes section example code.
(These will be corrected in the next point release)
User feedback is appreciated, especially from Mac users since I do not
have access to a MacStata user.
Thanks to Amresh Hanchate for presenting the initial challenge to
redevelop -chunky- based on problems he was having and to Dan
Blanchette for his testing and diligent error-finding.
DC Elliott
TITLE
'CHUNKY': module to chunk a large text file into smaller parts
DESCRIPTION
chunky breaks a large text file into chunks of a size specified
by the user. It is typically used to break a huge data dump that
is too large for infiling into smaller manageable chunks. chunky
will allow creation of serially named chunks for subsequent
infiling or insheeting. The smaller data subsets can then be
appended together to create a dataset with all required
observations. This version of chunky has been completely
rewritten to use the Mata capabilities of Stata release 9 and
higher and the syntax has completely changed. The previous
version has been deprecated as chunky8. Some users may still
require a line-indexed method of chunking files so chunky8 will
continue to be supported.
TITLE
'CHUNKY8': module to chunk a large text file into smaller parts
(version 8)
DESCRIPTION
chunky8 breaks a large text file into user specifiable
manageable chunks. It is typically used to break a huge data dump
that is too large for infiling into smaller manageable chunks.
chunky8 will allow serial chunking and then infiling or
insheeting. The smaller data subsets can then be appended
together to create a dataset with all required observations.
chunky8 is the deprecated previous version of chunky, the latter
having been completely rewritten to use the Mata capabilities of
Stata release 9 and higher. Some users may still require a
line-indexed method of chunking files so chunky8 will continue to
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/