More important than efficiency, I think, the do file is the document of your editing. The code referencing the id will be easy to understand when you look at it 6 months from now. I might even go one step further and include what you're changing FROM:
replace month = 1 if id==80 & month==4
replace year = 1996 if id==80 & year==1995
replace failed= 1 if id==80 & failed==.
Liz
>>> On 10/5/2008 at 12:22 PM, in message
<031173627889364697C50B3B266CBB8A01C08BB8@GEOGMAIL.geog.ad.dur.ac.uk>, "Nick
Cox" <[email protected]> wrote:
> Not so, or at least, it's more complicated than that.
>
> My short answer: On this information, Michael should leave his code as
> is.
>
> My longer answer:
>
> First of all, the indirection of using a local macro is more or less
> irrelevant to efficiency. In fact, if you recode as Martin suggested,
> the code will be a smidgen _slower_, as Stata is obliged to store the
> macro and then interpret it each time it is referenced. However, you
> would have to strain to tell the difference in timings. But remember:
> Stata is not a compiler! Interpretation always implies an overhead, just
> that in many cases it is negligible.
>
> On a style point, I would not use a local macro in this example. I can't
> see what real gain there is in terms of making the code more readable or
> comprehensible, setting aside the efficiency issue.
>
> On a larger issue, -if- is always less efficient than an equivalent -in-
> when there is a direct mapping between statements. What do I mean by
> that?
>
> Suppose you know that there is a single observation, say 5890, for which
> -id- is 80.
>
> Then you could and should code
>
> replace month = 1 in 5890
> replace year = 1996 in 5890
> replace failed= 1 in 5890
>
> if efficiency were your only concern. Given a qualifier, -in 5890-,
> Stata goes straight there, does the work, and bails out. Given a
> qualifier, say -if id == 80-, Stata respects it the slow and stupid way
> and tests every observation to see whether that condition is true or
> false. (It never does the sort of smart thing that people are good at,
> such as noticing whenever observations are ordered by -id- and taking
> that into account.) So, for equivalent actions, -if- is much slower than
> -in-.
>
> This principle is sometimes codified on Statalist, tongue in cheek, as
> Blasnik's Law, because Michael Blasnik has done more than anyone else to
> publicise it.
>
> However,
>
> 1. Efficiency should never be your only concern. Code with -if id == 80-
> is much more transparent than code with -in 5890-. Also, get the
> observation number wrong or mess up the sort order and you have
> introduced a hard-to-find bug.
>
> 2. The "suppose" is a big one. How do you find out the observation
> number if you don't know? You could do something like this
>
> gen long id = _n
> su id if id == 80, meanonly
> assert r(min) == r(max)
> local where = r(min)
> replace month = 1 in `where'
>
> etc.
>
> But you can see there is a trade-off here. You have to do more work
> beforehand to save work! In practice I would be most unlikely to bother.
> In general being clever like this will not help much and might involve
> extra work. Spending 2 minutes changing the code for 2 ms less machine
> time is usually dopey unless you know that you are going to use that
> code many, many times.
>
> 3. I've taken Michael literally in his implication that only a single
> observation is involved. The test above
>
> assert r(min) == r(max)
>
> tests whether that is so.
>
> At worst, the observations satisfying the -if- don't occur in a single
> block so that -in- is not applicable to the data as they stand. (In
> principle, that is always fixed by -sort-ing. Again in practice, there
> is a trade-off in that -sort-ing may take up considerable machine time
> itself.)
>
> Nick
> [email protected]
>
> (In a later post, Martin introduced what I think is another red herring
> by talking about dialogs. If you care about machine time, don't use
> dialogs.)
>
> Martin Weiss
>
> -replace- expects "oldvar =exp", so no, I do not think there is a more
> efficient way. Multiple instances of the same -if- qualifier always make
> it
> advisable to throw it into a -local-
>
> local mycond " if id==80"
> replace month = 1 `mycond'
> replace year = 1996 `mycond'
> replace failed= 1 `mycond'
>
> Michael McCulloch
>
> As part of a data audit, I'm recording some changes in my project
> do-file. Would there be a more efficient way to code the following
> changes, all of which involve the same observation?
>
> replace month = 1 if id==80
> replace year = 1996 if id==80
> replace failed= 1 if id==80
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/