Roger Newson
> A query about approved programming practice. In official
> Stata, most
> programs go out of their way to protect users from their
> own stupidity
> whenever there is a possibility that a user might
> accidentally over-write
> an existing data set in the memory. In particular, if a
> data set is present
> in the memory, then -use- will only input a new data set if
> the user
> specifies the -clear- option, and -exit- will not exit from
> Stata unless
> either the user specifies -clear- or the data set is
> unchanged. A notable
> exception is the -collapse- command, which routinely
> destroys pre-existing
> data sets, presumably because any user who uses -collapse-
> is assumed
> thereby to have consented implicitly to have the existing data set
> destroyed. People in the Stata community who write their
> own packages
> usually want their programs to conform to the same high level of
> user-friendliness as official Stata. Is there a general set
> of rules,
> approved by StataCorp or otherwise, regarding when programs
> should or
> should not routinely overwrite existing data sets in memory
> without a
> -clear- option? I ask because I have previously written
> ado-files (notably
> -parmest- and -dsconcat-) which can overwrite existing data
> sets in memory,
> and I have been advised (by Bill Gould) that they should do
> more than they
> do to protect users from their own stupidity (as -use- does
> and -collapse-
> doesn't).
I am not aware of a single source for this in Stata
Corp documentation.
What is most obvious, however, are not so much rules
as growing awareness of the importance of this topic and the
emergence of a variety of conventions. Of course,
Stata has long since provided a series of devices to
stop you or inhibit you from making substantial changes
to your data by accident.
Wearing two hats on my bald head, (1) as a user-programmer
(2) as Executive Editor, Stata Journal, I
endorse Roger's implication that this is an important area.
What is more, it can be surprisingly contentious,
especially when programmers codify what are in essence
personal or site conventions about what is legitimate
or sensible (including what may be legitimately or
sensibly undocumented!) within programs which are
then circulated for wider use.
In one recent program I saw, the user's data were
destroyed and replaced with another data set
without absolutely _no_ indication in the help
file or in the accompanying documentation that
this would happen. In my view, this is totally
unacceptable. Fortunately, the program could
be, and was, rewritten to avoid this.
In many other cases, something like the -sort-
order may be changed without this being flagged. I
have encountered programmer comments of the
following forms, some flavour being added here,
and some being my own attitudes when I too
did this:
1. "This will usually put the data in
a more sensible order and so is really a
feature."
2. "This never matters since any other program
can, and indeed should, put the data in the -sort-
order which it needs."
3. "All users who know what they are doing
use identifiers, so at most you just
need to -sort- to get back to an original status."
This particular point could be debated at some
length, but Stata Corp recently
added the option to make programs -sortpreserve-,
so this can always be avoided.
More importantly, official Stata's code has
seen considerable tightening up on this point
over the last few years, although some gaps
remain.
The following seem to be very widely
used conventions:
1. An option such as -replace- or -clear-
should be specified whenever that is
the result of the action. Such options
can never be abbreviated. However, Stata is
not completely consistent here. -collapse-
and -contract- don't need such an option.
Perhaps it is thought that the purpose
and result of such commands are obvious.
2. A small but growing habit is the use
of a -force- option whenever it is
thought a good idea to underline
to users that some violence is being
done.
3. -nobreak- blocks protect the
data during delicate operations.
In the -stylerules- document on SSC I suggested
the following guidelines. (The full context
of that document is important for evaluating these.)
Some may think these very severe, but
in my own experience the more one writes
Stata programs, the more you want to put
the responsibility for changing the data
where it belongs, on the user.
==================
Respect for datasets
In general, make no change to the data unless that is
the direct purpose of your program or that is explicitly
requested by the user. For example,
your program should not destroy the data in memory
unless that is essential for what it does
you should not create new permanent variables on
the side unless notified or requested
do not use variables, matrices, scalars or global
macros whose names might already be in use: there is
absolutely no need to guess at names unlikely to occur,
as temporary names can always be used (see help on
tempvar, tempname, and tempfile)
do not change the type of a variable unless requested
do not even change the sort order of data: programs can
easily be made sortpreserve.
=====================
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/