Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: .dta storage, why is too big?
From
Daniel Feenberg <[email protected]>
To
[email protected]
Subject
Re: st: .dta storage, why is too big?
Date
Tue, 7 Jun 2011 14:08:45 -0400 (EDT)
On Tue, 7 Jun 2011, Daniel Marcelino wrote:
Hello for all,
today I came across my old and new files size, R and Stata storage
respectively. This got me thinking about why Stata compression is too
inefficient compared to R? Even thought I use variable attributes like
labels R compression is incredible. For example, 530 mb of Stata file
turns into 9 mb R file and about 330 mb as txt file. So, my point is:
do you know any trick to compress Stata files addition to command line
"compress".
The Stata -compress- command does not do any sort of Shannonesque
compression. Rather it coverts each variable to the smallest type that
will hold it without conversion error. So a float that had only small
positive integers would be converted to a byte, but if a variable were
double precision, but had few possible values (e.g. the CPI in a short
panel) it would stay a double and continue to take up 8 bytes per
observation, no matter how many observations. Similarly, a dummy variable
that was zero in all observations but 1 would still take up a byte per
observation.
If you are running under Unix, you might use one of the Unix compress
commands on the dta file, and use the method described here:
http://www.stata.com/support/faqs/unix/pipe.html
to read or write such files, achive much better compression, and
possibly higher speed.
Daniel Feenberg
Best,
Daniel
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/