Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: how to be -assert-ive
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
st: RE: how to be -assert-ive
Date
Thu, 30 Sep 2010 14:20:50 +0100
Stas is referring to two papers. Note that Tips started up in Stata Journal 3(4).
SJ-3-4 dm0003 . . . . . . . . . . . . . . . Stata tip 3: How to be assertive
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. Gould
Q4/03 SJ 3(4):448 (no commands)
tips for using assert
SJ-1-1 pr0001 . . . . . . . . . . . . . . Statistical software certification
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. Gould
Q4/01 SJ 1(1):29--50 (no commands)
describes the automated process used to certify official
Stata and how these techniques can be adopted to your own
work
I doubt that citations are a very good measure of utility for Stata Journal Tips. People can use an Stata feature, sometimes very frequently, without it seeming obvious that it should be cited in a published paper, particularly if the focus of that paper is not itself the use of Stata.
However, the whole business of what is cited is rather odd. In many disciplines there is a strong tradition of bending over backwards to cite previously published substantive papers, which co-exists with a tradition of documenting how computations were done either very poorly or not at all. There's a feedback loop of journal rules and conventions leading to certain kinds of submissions and acceptances.
As an Editor of the SJ, I would certainly want to encourage people using Stata to quote Stata Journal material wherever appropriate, including within substantive journals. We live in interesting times when choice of software remains a big issue for many researchers and those who find Stata a good solution can do Stata, and Stata authors, an easy favour by citing their work.
But that's not Stas' main question.
I doubt that -assert-s are often used in ados, but I assert that they should be used heavily in
1. data management scripts to check that data are reasonable and consistent
2. certification scripts that check that programs produce what they should produce.
I'd expect -assert-s to be removed from most supposedly debugged ados. One of their main roles is for programmers to check that they got it right, or for people managing data to check whether someone got it right.
I would not use -assert- to check e.g. that a single scalar was within [0,1].
-confirm- on the other hand does belong in many ados. Its main role is usually to check whether users are providing suitable input.
Nick
[email protected]
Stas Kolenikov
This was the title of Bill Gould's Stata Tip#3 in the first issue of
Stata Journal; over 10 years, it has received a whopping 2 citations.
I wonder how often -assert-s are used in ado code? E.g., I know that
some sort of -egen- command should retain the pattern of missing
values from the source variable(s) to the newly generated variable; or
a regression command should return R^2 between 0 and 1; or a
prediction command should return non-negative values; after -mi-, I
would expect the missing values to be replaced by non-missing; etc.
Sometimes I do see -assert-s when I peek at the official Stata code,
but not every program would use it. Bill Gould's recent "Missing
manual" talk gives an example of prototypical code development in
which the program goes into an ado file, and a test script checks its
performance against the expected behavior using -assert-; and that's
how certification would usually proceed (see another WG's paper in
Stata Journal 1 (1) ).
Now, I wonder if -assert-s in ado code is a desirable practice, and
whether other people use it a lot. On short ten line programs, an
extra couple of -assert-s means 20% increase in # of lines, and most
likely a certain (although not necessarily a 20%) increase in
execution time. On long programs where a lot is going on, including
some convoluted -sort- operations, let alone numeric optimization or
root finding, an extra -assert- or two or three or five are negligible
in overall timing. As a matter of (i) good style and (ii) robust
programming, do I need to put more -assert-s in my code?
A command with similar intention is -confirm-, and again it is used in
the official Stata code, usually to check inputs. And of course there
is -assert()- in Mata, as well.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/