Note that awk might not be installed on some systems, especially under
Windows, but public domain versions should be downloadable for all
platforms on which Stata is supported.
Nick
[email protected]
Steven Samuels
There was a superfluous, but harmless option, in my first shell
statement.
Haiyan,
You might also try to pre-process the text file with awk. Here's an
example. I put the shell command into a stata do file, but you could
write a script for your system outside of Stata.
**************************CODE BEGINS**************************
* Data is in seven comma-separated fields in source.txt.
* We want fields 3 and 6
*"1,2,3,4,55,6,77"
*"11,22,33,44,5,66,7"
****************************************************
shell awk 'BEGIN {FS=","; OFS =","} ; {print($3, $6)}' source.txt >
in.txt;
insheet x3 x6 using in.txt, comma
list
***************************CODE ENDS***************************
On Jan 21, 2009, at 6:36 AM, <[email protected]>
> Many thanks for all your advice.
Joseph Coveney
> Haiyan Gao wrote:
>
> I have a very large dataset in CSV format with 486,000 KB. The data
> contains more than 100 fields and more than 300,000 recodes. I have
> tried to open this file by set mem 500M (or 1000M) and used
>
> insheet using filename.csv, clear
>
> The error message shows that there is no enough memory to load the
> data.
> Could anyone suggest me on the followings?
>
> 1) How to read only several fields from this CSV data file, say the
> first, thirteenth and thirtieth?
> 2) What command should I try to load the whole data?
>
> ----------------------------------------------------------------------
> --
> --------
>
> The easiest and most convenient way is to use Stat/Transfer
> ( www.stattransfer.com ) for this kind of problem, especially if
> you're
> going to encounter it regularly.
>
> Absent that, you could make use of Stata's -file- command to -read-
> in a
> limited number of records of the CVS file, turn right around and -
> write-
> them to a -tempfile-, then -insheet- that, and save it to an
> intermediate Stata dataset; repeat (reading the first record each time
> in order to read the variable names) with successive chunks of the
> original CSV file, and -append-
> the pieces (the intermediate Stata datasets). In order to
> automate the
> process, you'd put it in a -while- loop having -file- look for the
> end-of-file marker. You can use -file- to read in the first,
> thirteenth
> and thirtieth logical records (rows), as well.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/