Dear All,
it has just occured to me that string variables do not have extended
missing codes. A colleague of mine argues that this is perfectly fine,
because:
1) one can use any text to stand for particular situations ("not
applicable", "not responded",...)
2) for numerical values there are operations defined, which require
that they yield missing values if any argument is missing.
In a situation when I classify, say firms, by first letter of their
name, I will have "Not applicable" and "No response" as instances in
section "N", which is not what I want. Hence every time I deal with
the strings like that is to specifically check for particular string
values (and hence a different data entry operator inevitably chooses a
different coding, the programs become highly oriented/dependent on a
particular dataset), it is also quite tedious and annoying. One
solution I see is to create a masking variable, which for each
observation will have a code with an agreed upon code, e.g. 0=not
applicable; 1= valid observation; 2=applicable, but refused to answer;
3=applicable, but respondent doesn't know; etc.
I don't see this as a good solution, and I wonder, whether there is
any technical possibility to instruct Stata that a particular string
value should be treated as a missing value in some operations. I see
it along the lines:
char define make[extmiss_a] Not applicable
char define make[extmiss_b] No response
And later
gen make_group=substr(make,1,1)
will create empty values for those observations that had "Not
applicable" or "No response"
(however I still want to be able to distinguish between the two in
some cases, like -tabulate-)
What do you think about it? Are there extended missing string codes in
other statistical packages?
Thank you,
Sergiy Radyakin
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/