|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: Re: Re: st: drop redundant value labels
My analysis resembles Sergiy's. If the trimmed down dataset were much
smaller than the original, using -decode- on all the variables with
labels followed by dropping all the label definitions and then an
-encode- on all the -decode-d variables might be one way to go. Not
especially attractive, but might be worth consideration.
Nick
[email protected]
Sergiy Radyakin
===============
Unless there is some information regarding the selection to the final
sample -- brute force is the only way. It may be direct ( cycle
for-each-value-check-if-it-is-there) or it could be more involved, but
with the same thing going on behind the scenes. One thing to concider
however is whether you have more deleted labels or those that are
kept. E.g. in some cases it might be more efficient to cycle through
the observations that are left, than through all the labels,
especially if they (observations) are unique. Example: you have
observations, each representing an occupation, each occupation has a
label, you want to keep only "dangerous" occupations (defined as you
like). There will likely be relatively few of them among all, so go
brute force by observations, and keep the labels, that they are using.
You can also define your labels as a dataset with two fields: numeric
code and string label. After the selection in the data occurred, you
can merge the two datasets to determine, which labels must be kept.
But the overhead from having the labels should not be very large.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/