Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: .dta storage, why is too big?

From	Daniel Feenberg <[email protected]>
To	[email protected]
Subject	Re: st: .dta storage, why is too big?
Date	Tue, 7 Jun 2011 14:08:45 -0400 (EDT)


On Tue, 7 Jun 2011, Daniel Marcelino wrote:

Hello for all,

today I came across my old and new files size, R and Stata storage
respectively. This got me thinking about why Stata compression is too
inefficient compared to R? Even thought I use variable attributes like
labels R compression is incredible. For example, 530 mb of Stata file
turns into 9 mb R file and about 330 mb as txt file.  So, my point is:
do you know any trick to compress Stata files addition to command line
"compress".

The Stata -compress- command does not do any sort of Shannonesquecompression. Rather it coverts each variable to the smallest type thatwill hold it without conversion error. So a float that had only smallpositive integers would be converted to a byte, but if a variable weredouble precision, but had few possible values (e.g. the CPI in a shortpanel) it would stay a double and continue to take up 8 bytes perobservation, no matter how many observations. Similarly, a dummy variablethat was zero in all observations but 1 would still take up a byte perobservation.

If you are running under Unix, you might use one of the Unix compresscommands on the dta file, and use the method described here:


  http://www.stata.com/support/faqs/unix/pipe.html

to read or write such files, achive much better compression, andpossibly higher speed.


Daniel Feenberg


Best,
Daniel
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: .dta storage, why is too big?
  - From: Ronan Conroy <[email protected]>

References:
- st: .dta storage, why is too big?
  - From: Daniel Marcelino <[email protected]>

Prev by Date: st: RE: Extreme data points
Next by Date: st: esttab bootstrap observations
Previous by thread: st: .dta storage, why is too big?
Next by thread: Re: st: .dta storage, why is too big?
Index(es):
- Date
- Thread