Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: ambiguity in -if- qualifier
From
"Yu Chen, PhD" <[email protected]>
To
[email protected]
Subject
Re: st: ambiguity in -if- qualifier
Date
Mon, 24 Mar 2014 18:01:41 -0500
Hi, Nick,
Thank you very much for the explanation. You mentioned in the Remarks
of -mvsumm- (SSC) that there are possibly two interpretations: a weak
interpretation and a strong interpretation. You chose to use the weak
interpretation in developing the -mvsumm-.
Do you know whether such weak interpretation is consistently followed
by Stata in developing its official commands? If some official
commands employ the weak interpretation, but others employ the strong
interpretation, that will be a potential trap for those unaware of the
distinction.
Thank you.
Yu
On Mon, Mar 24, 2014 at 12:06 PM, Nick Cox <[email protected]> wrote:
> The reason for your puzzlement is becoming much clearer, so thanks for
> providing an example that can be discussed.
>
> Note, however, that your initial word description -- in your first
> paragraph -- does not fully match your code example, as your code
> example bites for a quite specific reason, which only the code makes
> clear.
>
> Naturally, Stata can calculate the previous value of a time series if
> the previous observation is present in the dataset, but not otherwise.
> (Similar remarks apply to the effects of any time series operator or
> subscripting where such imply reaching outside the observations
> selected by -if-.)
>
> Said differently, -if- selects observations to be used, but neither
> the -if- qualifier nor any other part of the syntax is thereby
> prohibited from invoking information in the other part of the data set
> whenever -if- selects a strict subset.
>
> But the problem here is not that Stata is being ambiguous, or
> inconsistent, or incorrect, but that users need to ask for what they
> want and want what they ask for.
>
> In your example, which we can all agree to be frivolous, you in effect
> carry out a regression on part of a panel and **part of what you
> calculate depends on values outside the data used**. That's at best
> dubious and at worst meaningless, but either way the decision to do
> that is yours, not Stata's.
>
> Otherwise put, it's your code that says "use lagged values for part of
> the data" and Stata does what it is told to the best of its ability.
> It's a robot and you are its instructor, in this example at least.
>
> I agree with you that people need to think about cases like this.
> Indeed, if you look at the help file for -mvsumm- (SSC) you will see
> "Remarks" written (by me, as it happens) on this very point in 2005.
>
> There are many other examples. Here is another.
>
> sysuse auto , clear
>
> gen mpg2 = mpg/_N if foreign
>
> keep if foreign
> gen mpg3 = mpg/_N
>
> -mpg2- and -mpg3- are quite different, as _N is the number of
> observations in the current dataset.
>
> The only clear rule needed here is to ask for exactly what you want.
>
> Nick
> [email protected]
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/