You may wish to use a CSV file editor. I have used the following:
http://csved.sjfrancke.nl/index.html (slow site, be patient)
It allows specific columns or record ranges to be saved to a new file.
Alternatively, I have an ado program that will break a file into
chunks by rows with a syntax like:
-chunky using filename1, index(1) chunk(1) saving(varnames.txt) list-
would show the list of variable names if they are stored in the first
row of a file
-chunky using filename1, index(1) chunk(100) saving(filename2)-
would save the first 100 rows of a file.
(An early version can be picked up from:
http://www.stata.com/statalist/archive/2008-10/msg00048.html
but I have made revisions and the dofile below relies on the modified version.
Contact me offlist if you want the most recent version. )
One can then create a program to loop through creating and then
appending pieces of the file:
*------------begin dofile-----------*
// Do file using chunky.ado to piece together parts of a very large file
// Pay particular attention to the edit points for infile and chunksize and keep
// edit VeryLargeFile.csv on the following line to point to your file
local infile VeryLargeFile.csv
// edit to size of chunk you want
local chunksize 10000
// Get just the first line if it has variable names
chunky using `"`infile'"', index(1) chunk(1) saving("varnames.csv",
replace) list
local chunk 1
local nextrow 2
while !`r(eof)' { // aborts loop when end of file is reached
tempfile chunkfile chunkappend
chunky using `"`infile'"', ///
index(`r(index)') chunk(`chunksize') saving("`chunkfile'")
if `r(eof)' {
continue, break
}
else {
local nextrow `=`r(index)'+1'
}
// use a shell command to append the varnames with the chunk
!copy varnames.csv+`chunkfile' `chunkappend'
// edit the following to conform to your csv delimiter
insheet using "`chunkappend'", clear tab names
// edit the following to keep specific variables
keep *
// save part of file and increment chunk count
save part`chunk++', replace
}
// Append parts together
local nparts `=`chunk'-1'
use part1, clear
forvalues i=2/`nparts' {
append using part`i'
// uncomment the following line to erase the parts
// erase part`i'
}
describe
*------------end dofile-----------*
If satisfied with the import, you should save with a different name
and then delete all the part##.dta files that are left behind or
uncomment the line erase part`i'
I've used this program to good effect importing very large csv data dumps.
DC Elliot
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/