There are intersecting issues here on several different
levels.
Let's start with the obvious.
0. You want Stata to be smart enough to ignore missings as
irrelevant when they are so, and you don't usually notice,
and don't usually complain, when that works as designed.
1. Missing values have to go somewhere when you -sort- the
data. There is no case for "in the middle"; it must be
one or other end, above the very highest or below the
very lowest. When you -sort- the data, observations with
missing values can't just hover in some philosophical mystery
zone; they must _go_ somewhere.
2. You have to decide what to do with missings when
you use inequalities. This is really the same issue as #1.
Sometime in the year 0, meaning 1985, StataCorp, or more
precisely CRC, plumped for their choice:
3. Stata chooses the high end. Numeric missings are arbitrarily
large. Rumour, or history, or Bill Gould, says that, mostly,
he had been irritated too many times by ploughing through
ordered lists from Some Alternative Software that started
with values he didn't care about. The other way, you can
stop reading when it stops being interesting.
Really, not much has changed since 1985 and those of us
still here and still using Stata in say 2029 will not,
I guess, be discussing anything different.
For a start, #3 has been the rule for so long and is embedded
in so many habits and so much code that changing any of
it is a recipe for mayhem and madness.
I don't think Tom's scheme has a chance of lift-off.
I gather he wants -if- to change behaviour, or x > 2 to change
meaning, and either makes me feel really queasy. There is perhaps
a little more chance of new functions, say
gt(x, 2) meaning x > 2 & x < .
ge(x, 2) meaning x >= 2 & x < .
but I am not sure they would actually be used much even if they
were introduced.
If this thread continues long enough, someone will
suggest some kind of three-way logic in which missings
are not high or low but just different. Tom in a way has
perhaps done that already. David Kantor
gave a talk on three-way logic at one Boston meeting
and the discussion was fast and furious. As I recall,
the audience who spoke divided into three (surprise):
those who were clear that three-way logic was a bad idea;
those who wanted some kind of three-way logic, but
definitely not David's; and those who liked David's
scheme. David himself didn't seem to like his own scheme
much the more he thought about it. And it didn't improve from there.
I gave the following talk, on directional data, and was able
to explain my subject as one of circular arguments.
I gather that StataCorp have batted this back and
forth internally, but got no further despite many
discussions than the idea that three-way logic would
solve a few problems but make things much worse for
most users, especially those in the first decade of
their Stata experience.
As a more detailed footnote, -inrange()- has been
around for a while and already offers one kind of solution.
-inrange(x, 42, .)- means "x >= 42 & x < .".
In fact, this is, in essence, a generalisation of -ge()-
above. My impression is that it hasn't caught on much,
which rather weakens any case for new functions.
Nick
[email protected]
Steichen, Thomas J.
> Isn't the simplest solution that missing should never be treated in
> Stata code as a "number"?
>
> Thus, things like sorts would need a documented definition of where
> missing go but we wouldn't have to work around the "numeric" missing
> so often.
>
> For example,
> replace x = 3 if x > 2 & != .
> becomes the much simpler
> replace x = 3 if x > 2
>
> I wonder how often I've messed up analyses because I forgot to tag
> on the "& != ." ?
>
> (Hmmmmm, the "& != ." kind of looks like cartoon-speak for what I
> usually say when I notice I've failed to add the tag!)
>
> Clearly, I'd much rather have Stata's code deal with this than for me
> to remember all the time, even if there is a processing overhead.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/