I want to add something to my posting on the treatment of missing values
in Mata functions. You may remember that I divided the numeric functions
into three categories,
(M) Mathematical functions
(S) Statistical functions
(U) Utility functions
and then I said that (M) functions should handle missing values, it
does not much matter what (S) does, and (U) functions do not allow
missing values, or give a special meaning to them.
I said that it does not much matter what a category (S) function does because
good programming style is to make sure what is passed to them does not contain
missing values, and that is easy to do. It is good programming style because,
as the data are divided into separate, easy-to-use matrices, one subroutine
might exclude one set of observations and another subroutine, another set.
Remember, what distinguishes a category (S) function is that it works on raw
data, and such data is invariably obtained from from st_data() or st_view(),
where it is easy to exclude the missing values at the outset.
Thus, I argued, although I did not explicitly say this, writing additional
code in a category (S) program is probably a waste of time because
1. It is probably better that a category (S) function does not
allow missing values, because otherwise, the user of the
function may be lead into sloppy and dangerous habits.
2. Ben Jann <[email protected]>, who asked the original
question, said he was doing this by coding
if (missing(x)) _error(3351)
Good idea, except -missing(x)- can be expensive to calculate.
The missing() function has to make a pass through the data,
looking for missing values, to establish that there are not
any.
Hence, even though it is probably better that a category (S) function
does not allow missing values, there is a cost to imposing that.
So here is what I add now:
In a category (S) function that does not accept missing values,
it is acceptable to omit
if (missing(x)) _error(3351)
as long as the function does something ugly in the presence of
missing values. The ugly action could be abort with error, or
it could be a result with some or all missing values. As long
as something ugly happens, the user of the function cannot be
mislead.
On the other hand, if the function that would return something that
could be be misinterpreted as a valid result, one should probably
include
if (missing(x)) _error(3351)
-- Bill
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/