Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: ambiguity in -if- qualifier
From
"Yu Chen, PhD" <[email protected]>
To
[email protected]
Subject
st: ambiguity in -if- qualifier
Date
Sat, 22 Mar 2014 16:26:10 -0500
I think there is some ambiguity in the meaning and usage of the -if-
qualifier. Generally, the command is performed on a subset that meets
the -if- condition. However, a command may perform many tasks, and the
subset for each task is not clear sometimes. For example, for the
-generate- command, it seems to calculate the result of the expression
on the full sample first, and then that result is assigned to a
subsample that meets the -if- condition. However, for the -egen-
command, the calculation is performed on a subset that meets the -if-
condition, not the full sample, and then that result is assigned to
the new variable on that subsample.
For example, see the code below.
sysuse auto
gen mpg2=mpg[_n-1] if foreign==1
Notice that observation number 53 has a value of 24 for mpg2. This
indicates that the task of taking a lagged value is performed on the
full sample first. Otherwise, this value should be missing. But -egen-
works differently.
There may exist other cases that have similar ambiguities. I would
suggest that Stata have a clear rule to address this issue. If the
rule is already out there, please tell me.
Thank you very much.
Yu Chen
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/