[email protected]
>
> I am writing an egen function which, among other things,
> assigns a new value
> label to the newly generated variable. I am concerned that
> a label with the
> same name may already be defined and wish that existing
> labels be protected.
> Several strategies could be used to get around this such as
> checking whether
> a particular value label exists, combined with either:
>
> i) exiting the routine while informing
> the user that this particular label
> is already defined;
> ii) request that the user specify a unique
> label name via a labname() option; or
> iii) generate a unique name using a -tempvar
> labname- call.
>
> That said, I am more concerned with "good coding practice"
> and consistency
> than with the specifics of a solution for my routine.
>
> The programming conventions as discussed on Statalist, or implied by
> official Stata routines, provide clear guidance with respect to new
> variables and new data files. For new variables, most
> routines use -capture
> confirm ...- early in the code to verify whether a variable
> already exists.
> For new data files, users are usually given the option to
> specify -replace-,
> to clarify that it is OK to overwrite data, otherwise, nothing is
> overwritten.
>
> I am not aware of a similar convention for value labels. I
> looked at the
> behaviour of -encode- to see how Stata would behave if I generated a
> variable with a name identical to a previously defined
> label (NB: -encode-
> uses the new variable name for the label name). It turns
> out that -encode-
> overwrites a existing labels of the same name. On the
> other hand, any
> -label define <labname>- statement is met with an error
> message if <labname>
> is already defined, unless option -, modify- is specified. See my
> postscript below for an example.
>
> One may point to the fact that Stata lets users overwrite
> scalars and
> matrices at will to suggest that labels do not deserve any
> particular
> protection. However, a fundamental difference between labels and
> scalars/matrices is that the latter are not saved with the
> data. Hence,
> overwriting labels _could_ be viewed as a modification to
> the data. I wrote
> _could_ since -describe- does not consider changes to value
> labels as being
> changes to the data. Consequently, users may exit Stata
> without a warning to
> the effect that the data has not been saved.
>
> At any rate, I am not hinting that -encode- should behave
> one way or another
> -- for now, I am taking its behaviour as a given. I am
> just wondering if
> there exists a preferred coding practice to prevent
> existing label from
> being overwritten and, possibly, seek a justification as to
> why changes to
> labels are not deemed to be changes to the data.
>
> P.S.:
>
> . * -encode- does not warn the user
> . * when a label is already defined
> . u c:\stata\auto, clear
> (1978 Automobile Data)
>
> . la def newvar 1 "a label" 2 "another label"
>
> . encode make, gen(newvar)
>
> . desc
>
> Contains data from c:\stata\auto.dta
> obs: 74 1978 Automobile Data
> vars: 13 7 Jul 2000 13:51
> size: 3,774 (100.0% of memory free)
> ------------------------------------------------------------
> ----------------
> ---
> storage display value
> variable name type format label variable label
> ------------------------------------------------------------
> ----------------
> ---
> make str18 %-18s Make and Model
> price int %8.0gc Price
> mpg int %8.0g Mileage (mpg)
> rep78 int %8.0g Repair Record 1978
> headroom float %6.1f Headroom (in.)
> trunk int %8.0g Trunk space (cu. ft.)
> weight int %8.0gc Weight (lbs.)
> length int %8.0g Length (in.)
> turn int %8.0g Turn Circle (ft.)
> displacement int %8.0g Displacement (cu. in.)
> gear_ratio float %6.2f Gear Ratio
> foreign byte %8.0g origin Car type
> newvar long %17.0g newvar Make and Model
> ------------------------------------------------------------
> ----------------
> ---
> Sorted by: foreign
> Note: dataset has changed since last saved
>
> . la list newvar
> newvar:
> 1 a label
> 2 another label
> 3 AMC Concord
> 4 AMC Pacer
> 5 AMC Spirit
> <snip>
> 75 VW Scirocco
> 76 Volvo 260
>
> .
> . * But -label define ...- has a safety feature
> . * (i.e. option modify) preventing a user from
> . * overwriting a label
> . u c:\stata\auto, clear
> (1978 Automobile Data)
>
> . encode make, gen(newvar)
>
> . la def newvar 1 "a label" 2 "another label"
> label newvar already defined
> r(110);
>
> end of do-file
> r(110);
>
I almost always tackle this by using
a command like
. tempname lblname
(which I think is what Patrick means
by his reference to -tempvar-).
The labels defined with
a tempname will be -save-d
with the data so long as they have been
attached to a variable, which is what
we are talking about.
I wasn't aware of -encode-'s power
to overwrite labels without authorisation.
As Patrick says, there is a range from
what is protected and cannot be changed
without explicit command to what is deemed
transient and trivial. (And, over time,
Stata has been tightening up on this.)
Although value labels are somewhere in between,
in many instances the overwriting of a set of value labels could
have major implications for data management
and analysis. On these grounds I would suggest
that this feature is at best a misfeature.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/