David Kantor indirectly brought up the question how unused value
labels can be removed from a dataset to reduce its size. Here is a
solution with -labelsof- from SSC.
sysuse auto
encode make, gen(make2)
drop if _n>5
labelsof make2
local labels "`r(values)'"
foreach x of local labels {
count if make2==`x'
if r(N)==0 {
lab def make2 `x' "", modify
}
}
lab list make2
Friedrich
On Fri, May 2, 2008 at 10:46 AM, David Kantor <[email protected]> wrote:
> Hello all,
>
> I just want to add some observations about encoding.
>
> When you encode a string variable, the file contains a copy of every
> distinct value. Consequently, it provides a space advantage usually only if
> many of the values are repeated. If all or most observations are distinct,
> then encoding will not gain a space advantage. (But you may have other
> reasons for encoding.)
>
> But even when encoding is advantageous in terms of space, there is one
> situation when it can backfire; I had not though of this until it happened
> to me. I had a large file with a string variable with many distinct values
> -- though many were often repeated. I encoded it, and gained a significant
> space savings.
>
> Later, I created a multitude of smaller subsets of this file. Each one had
> much fewer distinct values of the encoded variable. But each file retained
> the full encoding table -- more than it needed. (Each file replicated the
> encoding table.) The result was that each of the small files were much
> bigger than they really needed to be. (And the total size may have been much
> more then the original, even if there had been no overlap of observations.)
> Subsequently, I decoded the variable, and the files shrunk significantly.
>
> I thought this is something to be aware of.
> (It makes a potential case for having coding tables in a separate file. But
> there are plenty of reasons not to have it that way.)
>
> --David
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/