Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: ambiguity in -if- qualifier
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: ambiguity in -if- qualifier
Date
Tue, 25 Mar 2014 00:46:39 +0000
What the -mvsumm- help calls the "weak" interpretation will always be
followed unless you intervene afterwards to -replace- values that use
information outside the -if- restriction (or, equivalently, reduce the
dataset to the observations selected by -if-).
That's much of the point of those comments! The rest of the point is
to just to underline that that is what Stata does.
Nick
[email protected]
On 24 March 2014 23:01, Yu Chen, PhD <[email protected]> wrote:
> Hi, Nick,
> Thank you very much for the explanation. You mentioned in the Remarks
> of -mvsumm- (SSC) that there are possibly two interpretations: a weak
> interpretation and a strong interpretation. You chose to use the weak
> interpretation in developing the -mvsumm-.
> Do you know whether such weak interpretation is consistently followed
> by Stata in developing its official commands? If some official
> commands employ the weak interpretation, but others employ the strong
> interpretation, that will be a potential trap for those unaware of the
> distinction.
> Thank you.
>
> Yu
>
>
>
> On Mon, Mar 24, 2014 at 12:06 PM, Nick Cox <[email protected]> wrote:
>> The reason for your puzzlement is becoming much clearer, so thanks for
>> providing an example that can be discussed.
>>
>> Note, however, that your initial word description -- in your first
>> paragraph -- does not fully match your code example, as your code
>> example bites for a quite specific reason, which only the code makes
>> clear.
>>
>> Naturally, Stata can calculate the previous value of a time series if
>> the previous observation is present in the dataset, but not otherwise.
>> (Similar remarks apply to the effects of any time series operator or
>> subscripting where such imply reaching outside the observations
>> selected by -if-.)
>>
>> Said differently, -if- selects observations to be used, but neither
>> the -if- qualifier nor any other part of the syntax is thereby
>> prohibited from invoking information in the other part of the data set
>> whenever -if- selects a strict subset.
>>
>> But the problem here is not that Stata is being ambiguous, or
>> inconsistent, or incorrect, but that users need to ask for what they
>> want and want what they ask for.
>>
>> In your example, which we can all agree to be frivolous, you in effect
>> carry out a regression on part of a panel and **part of what you
>> calculate depends on values outside the data used**. That's at best
>> dubious and at worst meaningless, but either way the decision to do
>> that is yours, not Stata's.
>>
>> Otherwise put, it's your code that says "use lagged values for part of
>> the data" and Stata does what it is told to the best of its ability.
>> It's a robot and you are its instructor, in this example at least.
>>
>> I agree with you that people need to think about cases like this.
>> Indeed, if you look at the help file for -mvsumm- (SSC) you will see
>> "Remarks" written (by me, as it happens) on this very point in 2005.
>>
>> There are many other examples. Here is another.
>>
>> sysuse auto , clear
>>
>> gen mpg2 = mpg/_N if foreign
>>
>> keep if foreign
>> gen mpg3 = mpg/_N
>>
>> -mpg2- and -mpg3- are quite different, as _N is the number of
>> observations in the current dataset.
>>
>> The only clear rule needed here is to ask for exactly what you want.
>>
>> Nick
>> [email protected]
>>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/