|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Re: Bug in -use- or -if- ?
Sergiy, I believe that your colleague is correct in how Stata interprets the
underscore variable, _n. The help file states, " _n contains the number of
the current observation." And it also appears to qualifie -if- according to
the same criterion while -use- reads data in from a dataset file. If you're
loading a dataset from a disc file, _n is incremented as each observation's
record is read into memory. So, -if _n <= 37- will work, because _n will
increase from zero to 37 as further records are loaded and _n == 1, 2, 3, etc.
tests True as being less than 37. But, starting from -clear- (with _n equal
to zero), -if _n > 37- will never be True, because, as each candidate
observation record is read into memory for testing of the condition, _n would
only ever be equal to one, which is never greater than 37. And because the
condition tests as False at each test of -if _n > 37-, each successive
candidate record in the file on disc will be rejected--no observation records
will ever be read into memory.
The same holds for -if inrange(_n, 2, 20)-; starting with _n equal to zero
(empty in-memory dataset), _n will only be at most one as each successive
record is read and tested for the truth of -inrange(_n, 2, 20)-. _n will
never be between 2 and 20 and so each successive candidate record will be
rejected, leaving a dataset in memory of zero observations at the end.
Joseph Coveney
Sergiy Radyakin wrote:
in a different thread Dan Blanchette asked about cooperation of -in-
and -if-. I have asked myself a slightly different question whether
specifying if-conditions can always substitute for in-conditions: e.g.
instead of "in #A/#B" one can type "if inrange(_n,#A,#B)".
There seems to be a bug in -use- that get's confused by such a
condition. My colleague has suggested that this might happen because
Stata will qualify _n according to the current dataset in memory, but
qualify if- for the dataset during the load. I was able to come up
with an example where it get's confused unconditionally on the current
dataset. It seems that the conditon "larger" is not evaluated properly
in this case.
*** bug with use ... if F(_n)
*** N(auto.dta)=74
sysuse auto, clear
local fullauto `r(fn)'
use `"`fullauto'"' in 1/37, clear
count
assert (_N==37)
use `"`fullauto'"' in 38/74, clear
count
assert (_N==37)
use `"`fullauto'"' if _n<=37, clear
count
assert (_N==37)
use `"`fullauto'"' if _n>37, clear
count
assert (_N==37)
It is hard to understand what Stata will think of _n while loading
data, but it is definitely not the observation number.
Strangely the condition inrange(_n,1,20) loads 20 (twenty)
observations, but inrange(_n,2,20) loads 0 (zero).
So if you ever try to work with large datasets in smaller portions,
slice them with an in-condition, not an if-condition!
Stata MP for Windows, v10.1.551 born 02 Feb 2009, (currently latest.
This recent update brings some very welcomed changes: thank you!)
Best regards, Sergiy Radyakin
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/