Surprise is in the mind of the beholder. Your surprise depends on the
conjunction of several rules here:
1. Given a choice between interpreting something as a variable and that
same something as a scalar with the same or equivalent name, Stata
always goes for the variable, unless instructed otherwise. (That seems
clear-cut. Variables are first-class citizens in Stataland; scalars are
minor, ephemeral and dispensable.)
2. You can abbreviate variable names. Thus in your case "y" can mean
"year". (We could argue whether that was a good decision, but too late
now.)
3. Given a variable in a context where only a single value makes sense,
Stata goes for the first observation. Thus
. di varname
always is taken to mean
. di varname[1]
(Same comment as above. Sure, that seems dopey when it bites you, but my
guess (emphasise guess) is that Stata doing what you think it should
would depend on Stata trying to work out what the user should mean, and
that's hopeless as a route to software design. People's partners should
be good at sensitive intuition, but people's programs should stick to
syntax, not semantics or intentions.)
I think there are three morals.
(a) If you want to use scalars rather than locals, always use a tempname
for every scalar. That's a little more complicated, but safer. In a
proper program you really always should use tempnames to avoid possible
clashes like this.
(b) Or, or additionally, you can insist on the scalar interpretation by
using -scalar()-.
(c) Every non-trivial language bites if you can't see through all the
implications of all the rules and almost none of us can. There could be
a good article for the Stata Journal on the top twenty gotchas that bite
like this, so part of the issue is just how prominently stuff like this
is documented.
Nick
[email protected]
D H
> correct?) [Stata 8.2]
>
> To my surprise, using brackets to indicate observation
> number works with local and global macros, but not
> with the scalar command.
>
> As an example, consider the following dataset:
>
> . list
>
> +------------------------+
> | year unemp gdpgap |
> |------------------------|
> 1. | 1986 6.6 -.00888 |
> 2. | 1987 5.7 .00569 |
> 3. | 1988 5.3 .01402 |
> 4. | 1989 5.4 .01267 |
> 5. | 1990 6.3 -.01081 |
> |------------------------|
> 6. | 1991 7.3 -.02822 |
> 7. | 1992 7.4 -.01411 |
> 8. | 1993 6.5 -.01542 |
> <snip>
>
> . global yy=unemp[5]
>
> . di $yy
> 6.3000002
>
> So far so good (ignoring the problem with the float
> datatype). Now try the same with a scalar command:
>
> . scalar y=unemp[5]
>
> . di y
> 1986
>
> Here, Stata inexplicably appears to insert the upper
> left hand observation from the dataset, which happens
> to correspond to "year" rather than "unemp".
>
> The following will work properly though:
>
> . scalar z=unemp in 5
>
> . di z
> 6.3000002
>
> Any comments?
>
> What if I want to make a scalar a function of two
> different observations? (I suppose in the latter case
> I could grab the observations with a local macro, then
> insert the values into the scalar, but then I would
> lose any precision or speed advantages of the scalar
> method).
>
> As the fine manual notes, macros provide up to 13
> significant digits in scientific notation, while
> scalars provide more. Macros are also somewhat
> slower. If neither of these factors is especially
> important, perhaps the casual programmer might want to
> stick with macros.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/