Haiyan Gao wrote:
I have a very large dataset in CSV format with 486,000 KB. The data
contains more than 100 fields and more than 300,000 recodes. I have
tried to open this file by set mem 500M (or 1000M) and used
insheet using filename.csv, clear
The error message shows that there is no enough memory to load the data.
Could anyone suggest me on the followings?
1) How to read only several fields from this CSV data file, say the
first, thirteenth and thirtieth?
2) What command should I try to load the whole data?
--------------------------------------------------------------------------------
The easiest and most convenient way is to use Stat/Transfer
( www.stattransfer.com ) for this kind of problem, especially if you're going
to encounter it regularly.
Absent that, you could make use of Stata's -file- command to -read- in a
limited number of records of the CVS file, turn right around and -write- them
to a -tempfile-, then -insheet- that, and save it to an intermediate Stata
dataset; repeat (reading the first record each time in order to read the
variable names) with successive chunks of the original CSV file, and -append-
the pieces (the intermediate Stata datasets). In order to automate the
process, you'd put it in a -while- loop having -file- look for the end-of-file
marker. You can use -file- to read in the first, thirteenth and thirtieth
logical records (rows), as well.
Joseph Coveney
P.S. It's considered better form to avoid replying to a previous post when
starting a new thread.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/