Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Importing subset of a pipe delimited textfile
From
Daniel Feenberg <[email protected]>
To
[email protected]
Subject
Re: st: Importing subset of a pipe delimited textfile
Date
Wed, 17 Oct 2012 07:19:08 -0400 (EDT)
On Wed, 17 Oct 2012, Rob Shaw wrote:
Hi
I have a very large (around 4Gb) text file that has been pipe
delimited. It won't all fit in memory so I want to process it in
parts.
For fixed datasets I would use infile with the in 1/10000000 option
then 10000001/2000000 etc. However, this dataset has been pipe
delimited so I would need to use insheet, but insheet doesn't seem to
permit the "in" option.
Can anyone help please?
I take it that there are commas in the data, so that converting the pipes
to something else with filefilter won't work? You could convert the commas
to "~"s first? Data already has "~"s? No unused character available at
all?
In Unix there is the "split" command, which works on lines. In Windows
there are many split commands available, none from MS and mostly splitting
on bytes. That would work if your file has fixed record lengths. I see
there is "Text-File-Splitter" which seems to work on lines. I haven't used
it.
It is a shame that every input command in Stata is lacking useful features
that most of the other input commands seem to have. -in-, -if- and -keep-
are all things that should be universal.
dan feenberg
Many thanks
Rob
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/