The fact that -if- is always slower than an equivalent -in-
I call Blasnik's Law, not because Michael discovered it, but
because it needs a good name and he has done more than any
other user to make people aware of it.
Compare
keep in 1/100 (1)
and
keep if _n <= 100 (2)
and you imagine Stata implementing either of these. You
should be able to tell at a glance that they mean
the same thing, but you're a human and you are good
at working out meanings.
With (1), Stata can work out very fast to -keep-
the first 100 obs and -drop- everything else.
With (2), Stata is obliged by its own rules to test
every observation number _n against <= 100, and
to ask itself lots of questions like
_n is 2345. Is that <= 100? No.
So, don't -keep- this obs.
....
_n is 123456789. Is that <= 100? No.
So, don't -keep- this obs.
_n is 123456790. Is that <= 100? No.
So, don't -keep- this obs.
and so on,
because it has no intelligence to see the implications
that once you are past 100, further testing is
futile.
Hence the rule: Use -in- rather than -if- when they
are equivalent. Remember that with -if- Stata tests
_every_ observation to check whether the condition is
true, utterly regardless of whether it is "obvious"
that it need not do that. Stata doesn't do "obvious".
Nick
[email protected]
Nick Cox
> Interesting. You may get a bit more speed if
> you replace this
>
> egen rank_1 = rank(expression), by(ssrownum)
> egen rank_2 = rank(iso_VSV), by(ssrownum)
> egen corr = corr(rank_1 rank_2), by(ssrownum)
>
> by this:
>
> sort ssrownum
> by ssrowsum : egen rank_1 = rank(expression)
> by ssrowsum : egen rank_2 = rank(iso_VSV)
> by ssrowsum : egen corr = corr(rank_1 rank_2)
>
> The two code segments are equivalent in what
> you end with, but not in when they -sort-.
>
> SImilarly
>
> keep if _n >= `start' & _n <= `stop'
>
> should be faster as
>
> keep in `start'/`stop'
>
> and I would always use the built-in -sqrt()-
> when it applies, rather than powering to 0.5.
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/