Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Poststratification weighting, subpop, and missing values
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Poststratification weighting, subpop, and missing values
Date
Thu, 27 Sep 2012 16:03:53 +0100
I take your point, and agree that this kind of technicality can be an
issue for people who may have to manage and/or use code for all kinds
of different software. I am very happy to commend -missing()- and
-!missing()- too and have done so in print.
But the short-cut is not compulsory. It is for those who understand it
and like it. It's fair advice not to use something you won't remember
or don't understand.
Specifically, the idea that missing values count higher than any
non-missing value is built in into how Stata -sort-s data and I am
confident that it's here to stay. That's not a matter of internal
implementation, but quite explicit in Stata's behaviour.
Where do you draw the line any way? For example, Stata evaluates true
or false as 1 or 0 (e.g. (2 == 2) as 1 and (3 == 2) as 0) and many
Stata programmers use that sort of evaluation routinely. I would not
be put off in the slightest by remembering that there are languages in
which "true" is -1 not 1.
Nick
P.S. I don't think the Stata users on Stack Overflow who rely on your
support would agree that you "don't know Stata that well, after all".
On Thu, Sep 27, 2012 at 3:44 PM, Stas Kolenikov <[email protected]> wrote:
> On Thu, Sep 27, 2012 at 9:03 AM, Steve Samuels <[email protected]> wrote:
>>> 3. I use the clause "if !missing(y)" above, rather than "if y ~=.", because
>>> the latter would not capture missing values like ".a".
>> This seemed like a slick idea at 5:00 am, but Nick Cox privately reminded me of
>> a far better one to accomplish the same thing:
>>
>> "Tony Lachenbruch pointed out in 1992 that -if y < .- saves a character
>> on -if y != .- or -if y ~= .- and the tip gained extra force when .a
>> ... .z were introduced."
>
> With all due respect, I would never use a short cut that deals with
> this ordering of missing values. First, I personally do not keep track
> of that in my head (that is to say, I don't know Stata that well,
> after all). Second, this is something programmers refer to as "strong
> coupling", as it relies on the knowledge of highly technical details
> of internal implementation of the missing values that may or may not
> be legal to use and stable in different versions. There's arguably
> legacy code that does use this ordering inside the official Stata
> code, so Stata Corp is unlikely to change that, but there is no
> guarantee that in Stata 45 it will still be that way. From this
> stability and independence of implementation perspective, the function
> -missing()- is provided exactly for the reason of being able to
> correctly deal with whatever the system of missing values is in that
> Stata 45 version. Third, strong coupling also means that if somebody
> else with little knowledge of Stata were to use that code, and port it
> into say R or Python or whichever other software that has a different
> system of missing values (say, the missing value is MINUS infinity,
> not the PLUS infinity, or is not-a-number and cannot be compared to a
> number), then this snippet of code will produce errors at best, and
> totally wrong results at worst.
>
> Fourth, I type fast enough to put -if !missing(y)- in my do-files in
> about as much time as it would take me to type -if y < .-.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/