[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: Re: st: processing time

From	n j cox <[email protected]>
To	[email protected]
Subject	Re: Re: st: processing time
Date	Thu, 22 Mar 2007 18:37:45 +0000

It would be interesting to know, in broad terms, what
Stata does. Setting aside
efficiency matters for the moment, consider commands like

drop if mod(_n, 2)

or

drop if y == . | y[_n-1] == .

in which the decision on -drop-ping is sensitive to _n.
From examples like these, it seems that the
dropping cannot start before the identification of
observations to be dropped has finished.

I've not seen the code, so this is just a guess.
But David is surely right that two commands must entail
at a minimum two loops over the observations (and
perhaps even four).

Nick
[email protected]

David Kantor

I would think that the latter is more efficient, especially with
large datasets. You incur the cost of parsing and executing a command
once, rather than twice (though the expression is more complex, but I
don't suppose that matters much). Furthermore, the latter may be
especially more efficient if there are many cases with b==. that do
not have a==. .  The reason is that when you drop observations, there
is, I suppose, a moving of records to close up the holes. With the
two-command method, some records will be moved twice, rather than once.

I suppose it makes little difference for small datasets.

You can also -set rmsg on-, and run some experiments.

Finally, be aware that a==. is not the general way to test for
missing value; that will test for equality with one specific missing
value.  The way to test for missing values in general is mi(a) or
a>=. . The method of mi(a) is even more general in that it works for
string types as well.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: [no subject]
Next by Date: st: Re: Brier Score
Previous by thread: st: RE: processing time
Next by thread: Re: st: Re: st: Sample question -- can't replicate my results from my laptop to desktop eventhough I set seed
Index(es):
- Date
- Thread