A slight correction to my previous post (included in its entirety
below my siganture): The second -use- command I showed should be
. use f g h i j using master
rather than
. use d e f g h using master
--Alan Riley
([email protected])
Alan Riley wrote:
> Martin Weiss has a dataset which started as a 2.4 GB csv file and has
> been converted to a 5.5 GB Stata .dta file. He has a 64-bit computer
> with 4 GB of RAM, which isn't quite enough to read in this dataset as
> a whole:
> > if only I could open the file and compress it... I have the latest gear in
> > terms of hard- and software (MP/2 10.0 64 bit, 4GB RAM, Vista Business 64
> > bit, ...) but it is next to impossible to open the 5.5 GB file. Virtual mem
> > makes things so slow it takes all the fun out of it... So I am stuck in a
> > bit of a quandary.
>
> He wishes he could read it in just once and use Stata's -compress- command
> on it to store the variables more efficiently. My guess is that all
> of the variables are stored as -float- or -double- when many could
> probably be stored as smaller types such as -byte- or -int-.
>
> Austin Nichols made a couple of suggestions:
> > Can you put a 8GB memory stick on the computer--can't Vista treat
> > those as RAM? How did you turn your 2.4 GB .csv file into a 5.5GB
> > Stata file, anyway? Can you specify a different variable type in that
> > process, or save different sets of variables to different files (with
> > an identifier for later merging)?
>
> Austin's suggestion about saving different sets of variables to
> different files is exactly what I think Martin should do.
>
> First, let me say that an 8 GB memory stick would not really help.
> Although this is "memory", it is not the same kind of memory that
> is used as RAM by a computer system. These sticks are not much
> faster than hard drives when it comes to transferring large amounts
> of data, although they can 'find' files faster that are stored on
> them.
>
> If Martin has a dataset named 'master.dta' with 10 variables named
> 'a b c d e f g h i j', he could execute the following in Stata to
> compress and recombine the entire file:
>
> . use a b c d e using master
> . compress
> . save part1
> . use d e f g h using master
> . compress
> . save part2
> . use part1
> . merge using part2
> . drop _merge
> . save newmaster
>
> Martin might need to do this in 3 or 4 parts, but hopefully after
> doing the above, he will be left with a new dataset which will
> fit entirely in the RAM on his computer.
>
>
> --Alan Riley
> ([email protected])
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/