Dear All,
Stata's help on quotes contains the following line:
----------------------- excerpt from -help quotes -
--------------------------------------
No rational user would use `""' (called compound double quotes)
instead of "" (called simple double quotes), but smart programmers do
use them
----------------------- end of excerpt
-------------------------------------------
and as we all remember the reason for this is that the local/global
that we are evaluating may contain double quotes itself, e.g.:
===========================
. local a `"first`=char(34)'second"'
. display "`a'"
firsttoo few quotes
r(132);
. display `"`a'"'
first"second
===========================
No news so far and it seems that "smart programmers" would indeed
prefer to always use compound quotes and be in fact `"smart
programmers"'. But how about this?
===========================
. local a "first`=char(96)'second"
. display `"`a'"'
first
. display "`a'"
first`second
===========================
First thing you noticed was that I've used compound quites in local
assignment in the first example and regular quotes in the second
example. This is because
. local a `"first`=char(96)'second"'
invalid syntax
r(198);
fails already during local assignment and thus is not interesting.
However in my case the contents of the local is fed in from a file
with:
file read fh oneline
so no quotes are involved during the step of assigning a value to the
local. And direct assignment here has it's purpose of ease of
replication.
What we observe next, is that Stata didn't fail with an error message
in the first -display-
Stata developers have spent huge effort explaining necessity of
compound quotes and how Stata can get confused in their absence. Let's
see how rational Stata should behave in this case.
We know that the content of the local is exactly what is written in
the curly brackets here {first`second} (and we can double-check it
with -macro dir-)
Let's see just how Stata could parse {display `"first`second"'}
Well it could start left-to-right and get the first token, which is a
command {display}
Stata then could execute the -display- subroutine, which eventually
would request the rest and attempt to display it on the screen.
What can Stata display? Almost everything. And at the same time almost
nothing. Stata can't -display- classes, matrices, plugins, etc (it
must use class describe __, matrix list ___, ...)
But it can -display-:
1) numerical constants
2) string constants
3) variables values (first observation)
4) values of local and global macros and extended macro functions
5) string expressions (which as a generalization include #1-#4) and functions
perhaps, something else
So Stata would need to determine what is being fed to -display-.
It reads the first character {`} and realizes that it is either #2
(constant in compound quotes) or #4 (evaluation of a local)
It then reads the next character {"} and realizes that since a local
may not have a name with {"} in it, it must be a constant in compound
quotes (case #2)
It then reads the value of this constant f-i-r-s-t and hits the {`}
character, which has a special meaning. Either this is a beginning of
a compound quote (nested, case #2) or evaluation of a local macro
(case #4). To resolve Stata should read the next character, which is
{s} - is not a quote, so it is not a nested quote, but a macro
evaluation (case #4).
Ok, now Stata switches to extracting the name of the local to be
evaluated, and continues char-by-char: s-e-c-o-n-d and hits the {"}
character!
Again, Stata hits the {"} character while it is in the mode of
extracting the name of a local.
Locals may not contain {"} as part of their name, and thus Stata must
stop with an error (I believe error 198, which is not always "invalid
syntax", but changes it's message to "___ invalid name" sometimes,
such as in this case:
. local "a=1
_"a=1 invalid name
r(198);
Here Stata realizes that the token following -local- must be a valid
name of a local, and thus stops with an error because it encounters an
invalid character {"}
My strong belief is that it must issue the same error when I type:
display `alpha"beta'
(invalid character within a local's name) but it doesn't: it displays nothing
Why did I write {alpha} and {beta}, and not {a} and {b} above? One
would also expect the same result from
display `a"b'
If you did all the steps above, try it and see what happens (by no
means should Stata be looking what it attempts to find, but realize
immediately that this is simply impossible).
To make long story short, I don't want Stata to issue an error message
in this case (it is ok):
display `undefined_local_name'
But I do want an error message in the following case:
display `impossible_local_name'
I admit the task of Stata is very complex and parsing expressions is
not easy. I made a series of simplifications and assumptions trying to
understand just how it might work, or how would I do it myself. I
admit my assumptions might be too strong. I am particularly concerned
about the assumption that Stata parses expressions from left-to-right
:) which is almost surely not true, given this observed behavior:
. di `alpha`'
`alpha invalid name
r(198);
There are other, more subtle inconsistencies, which I can't explain,
like why is:
. local answ `"yes`=char(96)'"'
valid, but
. local answ `"yes`=char(96)' "'
invalid syntax
r(198);
fails (only one space added before the closing {"'})
Lessons I've learned from this lengthy description is the following:
1. `"smart programmers"' are not always smart. When they write {if
`"`answ'"' == "yes"} they avoid the trouble of having a {"} symbol
inside the local, but hit a symmetric trouble of encountering {`}.
Note that "smart programmers" don't have this second problem, because
{if "`answ'" == "yes"} works for this case just fine.
2. Given this, how would a {smart programmer} test if the local {answ}
is equal to {yes} given that potentially it can contain an unmatched
{`} and an unmatched {"} ?
(The only solution I came up with so far would have worked only if
Stata issued an error message where I suggested and it currently
doesn't)
3. Reading lines from a .do file and passing them around in compound
quotes is an excellent finder of forgotten {'} (closing for {`}).
Indeed, some parts of the program might look like:
if (...rarely true condition...) {
if `local_a'==`local_b {
display "equal"
}
}
else {
** do something else
}
and the mistake (missing closing {'}) will not be noticed for a long
time by the programmer, because Stata never comes to that part where
it is missing.
A small program that reads each line of a .do or an .ado file and just
does anything with the read value (in compound quotes) will
automatically fail when it encounters a line with unbalanced {`} and
{'} even without understanding of what it is doing or what the
commands really mean, or which part of the program it is in. (though
there are probably false positives as well).
4. The way Stata program interprets: comments, line terminators and
quotes (see my older emails about recursion of comments and line
terminators in Mata) will probably remain a nebula size of Andromeda.
If you are still reading here, thank you for your extraordinary patience.
A copy of this message has not been submitted to Stata's technical
support, it is too long, boring, and probably biased. But any comments
are welcomed.
Stata 9.2 and 10.1, current Windows versions.
I wish everyone a great Superbowl weekend!
Sergiy Radyakin
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/