Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: ambiguity in -if- qualifier
From
John Antonakis <[email protected]>
To
[email protected]
Subject
Re: st: ambiguity in -if- qualifier
Date
Tue, 25 Mar 2014 10:38:12 +0100
Brilliant Nick!
__________________________________________
John Antonakis
Professor of Organizational Behavior
Director, Ph.D. Program in Management
Faculty of Business and Economics
University of Lausanne
Internef #618
CH-1015 Lausanne-Dorigny
Switzerland
Tel ++41 (0)21 692-3438
Fax ++41 (0)21 692-3305
http://www.hec.unil.ch/people/jantonakis
Associate Editor:
The Leadership Quarterly
Organizational Research Methods
__________________________________________
On 25.03.2014 10:21, Nick Cox wrote:
> I think this example highlights the core of Yu Chen's concern. I
> reverse Yu's style and present a plausible example in facetious
> manner.
>
> Question. Professor Nobelordie is teaching an advanced econometrics
> class, "Testing for heteros{c|k}edasticity under a full moon". He
> presents students with a dataset for 1900-2012 but for reasons
> compelling to economists tells them to use only data from 1970 on to
> build an autoregressive model predicting something of interest.
>
> Students Strict and Weak attempt this problem. Student Strict starts
out by
>
> keep if year >= 1970
>
> and then fits her model. Student Weak omits this step but carefully puts
>
> if year >= 1970
>
> on all his statements. They get different results. Explain why, and
> apportion blame between
>
> (a) Professor Nobelordie
>
> (b) Student Strict
>
> (c) Student Weak
>
> (d) Stata.
>
> Answer. Student Strict is reasoning "only use data from 1970 on", but
> following the -keep- L1. values are not available for 1970 because
> 1969 is not in the dataset any more, L2 values are not available for
> 1971 for the same reason, and so on and so forth. Student Weak can use
> more data (much more if there are several lagged terms in his model).
> Provided they keep and show their code, the discrepancy can be
> unearthed and explained.
>
> Professor Nobelordie is guilty of a vague instruction, unless the
> point of the question was for students to discover the ambiguity hard
> way.
>
> Stata is blameless. It just sits there, trying very hard to do what
> it's told. -if- pushes one way, time series operators push another
> way.
>
> Nick
> [email protected]
>
>
> On 25 March 2014 00:46, Nick Cox <[email protected]> wrote:
>> What the -mvsumm- help calls the "weak" interpretation will always be
>> followed unless you intervene afterwards to -replace- values that use
>> information outside the -if- restriction (or, equivalently, reduce the
>> dataset to the observations selected by -if-).
>>
>> That's much of the point of those comments! The rest of the point is
>> to just to underline that that is what Stata does.
>>
>>
>> Nick
>> [email protected]
>>
>>
>> On 24 March 2014 23:01, Yu Chen, PhD <[email protected]> wrote:
>>> Hi, Nick,
>>> Thank you very much for the explanation. You mentioned in the Remarks
>>> of -mvsumm- (SSC) that there are possibly two interpretations: a weak
>>> interpretation and a strong interpretation. You chose to use the weak
>>> interpretation in developing the -mvsumm-.
>>> Do you know whether such weak interpretation is consistently followed
>>> by Stata in developing its official commands? If some official
>>> commands employ the weak interpretation, but others employ the strong
>>> interpretation, that will be a potential trap for those unaware of the
>>> distinction.
>>> Thank you.
>>>
>>> Yu
>>>
>>>
>>>
>>> On Mon, Mar 24, 2014 at 12:06 PM, Nick Cox <[email protected]>
wrote:
>>>> The reason for your puzzlement is becoming much clearer, so thanks for
>>>> providing an example that can be discussed.
>>>>
>>>> Note, however, that your initial word description -- in your first
>>>> paragraph -- does not fully match your code example, as your code
>>>> example bites for a quite specific reason, which only the code makes
>>>> clear.
>>>>
>>>> Naturally, Stata can calculate the previous value of a time series if
>>>> the previous observation is present in the dataset, but not otherwise.
>>>> (Similar remarks apply to the effects of any time series operator or
>>>> subscripting where such imply reaching outside the observations
>>>> selected by -if-.)
>>>>
>>>> Said differently, -if- selects observations to be used, but neither
>>>> the -if- qualifier nor any other part of the syntax is thereby
>>>> prohibited from invoking information in the other part of the data set
>>>> whenever -if- selects a strict subset.
>>>>
>>>> But the problem here is not that Stata is being ambiguous, or
>>>> inconsistent, or incorrect, but that users need to ask for what they
>>>> want and want what they ask for.
>>>>
>>>> In your example, which we can all agree to be frivolous, you in effect
>>>> carry out a regression on part of a panel and **part of what you
>>>> calculate depends on values outside the data used**. That's at best
>>>> dubious and at worst meaningless, but either way the decision to do
>>>> that is yours, not Stata's.
>>>>
>>>> Otherwise put, it's your code that says "use lagged values for part of
>>>> the data" and Stata does what it is told to the best of its ability.
>>>> It's a robot and you are its instructor, in this example at least.
>>>>
>>>> I agree with you that people need to think about cases like this.
>>>> Indeed, if you look at the help file for -mvsumm- (SSC) you will see
>>>> "Remarks" written (by me, as it happens) on this very point in 2005.
>>>>
>>>> There are many other examples. Here is another.
>>>>
>>>> sysuse auto , clear
>>>>
>>>> gen mpg2 = mpg/_N if foreign
>>>>
>>>> keep if foreign
>>>> gen mpg3 = mpg/_N
>>>>
>>>> -mpg2- and -mpg3- are quite different, as _N is the number of
>>>> observations in the current dataset.
>>>>
>>>> The only clear rule needed here is to ask for exactly what you want.
>>>>
>>>> Nick
>>>> [email protected]
>>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/