In a recent posting Subhankar said
The -by- command is so much faster than the -while- command...
If I compare
by month: regress returns factor
vs.
local i = 1
while i <= 1000 {
regress returns factor if `i' == month
local i = `i' + 1
}
I find that the -by- command is atleast 15-20 times faster than the -while-
loop.
The speed differential here has nothing to do with by vs while. The clumsy part of your code is the if i==month. Stata must examine EACH observation in the dataset for EVERY pass through this loop. Let us say that you know that there are a certain number of observations per month. Then replacing the if with an in first/last will speed this up immensely. If the number of obs per month is constant, then this could be done with a simple counter. If the number of obs per month varies, then it is worth it to pass through the dataset ONCE and set up two integer sequences containing the first and last obs for that month, and reference those in the in statement. That fix will, I imagine, remove most of the speed differential between these two methods.
Bottom line: in a large (esp. panel) dataset, never use the if qualifier--especially when you're doing some sort of loop over chunks of the data. It is horribly inefficient.
Kit
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/