It would be interesting to know, in broad terms, what
Stata does. Setting aside
efficiency matters for the moment, consider commands like
drop if mod(_n, 2)
or
drop if y == . | y[_n-1] == .
in which the decision on -drop-ping is sensitive to _n.
From examples like these, it seems that the
dropping cannot start before the identification of
observations to be dropped has finished.
I've not seen the code, so this is just a guess.
But David is surely right that two commands must entail
at a minimum two loops over the observations (and
perhaps even four).
Nick
[email protected]
David Kantor
I would think that the latter is more efficient, especially with
large datasets. You incur the cost of parsing and executing a command
once, rather than twice (though the expression is more complex, but I
don't suppose that matters much). Furthermore, the latter may be
especially more efficient if there are many cases with b==. that do
not have a==. . The reason is that when you drop observations, there
is, I suppose, a moving of records to close up the holes. With the
two-command method, some records will be moved twice, rather than once.
I suppose it makes little difference for small datasets.
You can also -set rmsg on-, and run some experiments.
Finally, be aware that a==. is not the general way to test for
missing value; that will test for equality with one specific missing
value. The way to test for missing values in general is mi(a) or
a>=. . The method of mi(a) is even more general in that it works for
string types as well.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/