Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: ambiguity in -if- qualifier
From
"Yu Chen, PhD" <[email protected]>
To
[email protected]
Subject
Re: st: ambiguity in -if- qualifier
Date
Sat, 22 Mar 2014 19:44:57 -0500
Hi, Nick,
Let me clarify. For any assignment to a new variable, there are two
steps. Step 1, the expression should be evaluated; and Step2, the
result of the evaluation is assigned to the new variable. My question
is, what is the sample used in each step?
For -generate-, Step 1 uses the full sample. In other words, all
observations, regardless whether they meet the -if- condition, can be
used. But in Step 2, -generate- uses the subsample that meets the -if-
condition.
However, there may exist such commands that use a subsample in Step 1.
In other words, before the command does any thing, the sample is
reduced according to the -if- condition, so all other activities that
the command is going to do are on this reduced sample. It seems to me
that most commands work this way. But I found that -generate- is an
exception. It does not restrict the sample until the last step.
I think this is a little confusing. At least, there is no consistency
in when to restrict the sample.
Thank you.
On Sat, Mar 22, 2014 at 6:45 PM, Nick Cox <[email protected]> wrote:
> I don't think the one precise example here is puzzling in any sense.
> Previous values of -mpg- are put in a new variable if and only
> -foreign- is 1. This is calculated observation by observation.
>
> You allude to different behaviour with -egen-. But the help for -egen- explains
>
> "Explicit subscripting (using _N and _n), which is commonly used with
> generate, should not be used with egen; see subscripting."
>
> That may illuminate your puzzlement.
>
> Nick
> [email protected]
>
>
> On 22 March 2014 21:26, Yu Chen, PhD <[email protected]> wrote:
>> I think there is some ambiguity in the meaning and usage of the -if-
>> qualifier. Generally, the command is performed on a subset that meets
>> the -if- condition. However, a command may perform many tasks, and the
>> subset for each task is not clear sometimes. For example, for the
>> -generate- command, it seems to calculate the result of the expression
>> on the full sample first, and then that result is assigned to a
>> subsample that meets the -if- condition. However, for the -egen-
>> command, the calculation is performed on a subset that meets the -if-
>> condition, not the full sample, and then that result is assigned to
>> the new variable on that subsample.
>>
>> For example, see the code below.
>>
>> sysuse auto
>> gen mpg2=mpg[_n-1] if foreign==1
>>
>> Notice that observation number 53 has a value of 24 for mpg2. This
>> indicates that the task of taking a lagged value is performed on the
>> full sample first. Otherwise, this value should be missing. But -egen-
>> works differently.
>>
>> There may exist other cases that have similar ambiguities. I would
>> suggest that Stata have a clear rule to address this issue. If the
>> rule is already out there, please tell me.
>> Thank you very much.
>>
>> Yu Chen
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
--
Yu Chen, Ph.D.
Assistant Professor of Accounting
A. R. Sanchez, Jr. School of Business, WHTC 218D
Texas A&M International University
5201 University Boulevard
Laredo, Texas 78041-1900
USA
956-326-2513 (office)
956-326-2479 (fax)
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/