These are very good points. In support of this idea, I note that many users
rely on value labels to document their data. If these get clobbered
inadvertently, then so does the documentation. Of course, I don't mean to
suggest that value labels can replace proper documentation, but even casual
reliance on value labels to define coded variables should be able to assume
that assigned labels will only change by intention.
-----Original Message-----
From: [email protected]
[mailto:[email protected]]On Behalf Of
[email protected]
Sent: Tuesday, November 19, 2002 10:20 AM
To: [email protected]
Subject: st: Convention RE ado's which may redefine value labels
I am writing an egen function which, among other things, assigns a new value
label to the newly generated variable. I am concerned that a label with the
same name may already be defined and wish that existing labels be protected.
Several strategies could be used to get around this such as checking whether
a particular value label exists, combined with either:
i) exiting the routine while informing
the user that this particular label
is already defined;
ii) request that the user specify a unique
label name via a labname() option; or
iii) generate a unique name using a -tempvar
labname- call.
That said, I am more concerned with "good coding practice" and consistency
than with the specifics of a solution for my routine.
The programming conventions as discussed on Statalist, or implied by
official Stata routines, provide clear guidance with respect to new
variables and new data files. For new variables, most routines use -capture
confirm ...- early in the code to verify whether a variable already exists.
For new data files, users are usually given the option to specify -replace-,
to clarify that it is OK to overwrite data, otherwise, nothing is
overwritten.
I am not aware of a similar convention for value labels. I looked at the
behaviour of -encode- to see how Stata would behave if I generated a
variable with a name identical to a previously defined label (NB: -encode-
uses the new variable name for the label name). It turns out that -encode-
overwrites a existing labels of the same name. On the other hand, any
-label define <labname>- statement is met with an error message if <labname>
is already defined, unless option -, modify- is specified. See my
postscript below for an example.
One may point to the fact that Stata lets users overwrite scalars and
matrices at will to suggest that labels do not deserve any particular
protection. However, a fundamental difference between labels and
scalars/matrices is that the latter are not saved with the data. Hence,
overwriting labels _could_ be viewed as a modification to the data. I wrote
_could_ since -describe- does not consider changes to value labels as being
changes to the data. Consequently, users may exit Stata without a warning to
the effect that the data has not been saved.
At any rate, I am not hinting that -encode- should behave one way or another
-- for now, I am taking its behaviour as a given. I am just wondering if
there exists a preferred coding practice to prevent existing label from
being overwritten and, possibly, seek a justification as to why changes to
labels are not deemed to be changes to the data.
Patrick Joly
[email protected]
[email protected]
P.S.:
. * -encode- does not warn the user
. * when a label is already defined
. u c:\stata\auto, clear
(1978 Automobile Data)
. la def newvar 1 "a label" 2 "another label"
. encode make, gen(newvar)
. desc
Contains data from c:\stata\auto.dta
obs: 74 1978 Automobile Data
vars: 13 7 Jul 2000 13:51
size: 3,774 (100.0% of memory free)
----------------------------------------------------------------------------
---
storage display value
variable name type format label variable label
----------------------------------------------------------------------------
---
make str18 %-18s Make and Model
price int %8.0gc Price
mpg int %8.0g Mileage (mpg)
rep78 int %8.0g Repair Record 1978
headroom float %6.1f Headroom (in.)
trunk int %8.0g Trunk space (cu. ft.)
weight int %8.0gc Weight (lbs.)
length int %8.0g Length (in.)
turn int %8.0g Turn Circle (ft.)
displacement int %8.0g Displacement (cu. in.)
gear_ratio float %6.2f Gear Ratio
foreign byte %8.0g origin Car type
newvar long %17.0g newvar Make and Model
----------------------------------------------------------------------------
---
Sorted by: foreign
Note: dataset has changed since last saved
. la list newvar
newvar:
1 a label
2 another label
3 AMC Concord
4 AMC Pacer
5 AMC Spirit
<snip>
75 VW Scirocco
76 Volvo 260
.
. * But -label define ...- has a safety feature
. * (i.e. option modify) preventing a user from
. * overwriting a label
. u c:\stata\auto, clear
(1978 Automobile Data)
. encode make, gen(newvar)
. la def newvar 1 "a label" 2 "another label"
label newvar already defined
r(110);
end of do-file
r(110);
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/