This is interesting.
I have the understanding that 1./2. is the 'preferred' approach, and that -program-s should be kept for 'creating new commands', not for running simple batches of commands (that being the purpose of 'do files').
One argument I see against using -program-s instead of multiple/nested do-files is the risk of inadvertently redefining a command. Try for example,
pr def ml
di "Do Maria Lisa's tasks here... some data manipulation for example"
end
pr def analysis
ssc install dagumfit
sysuse auto
dagumfit price
end
ml
analysis
This fails because -ml- is inadvertently redefining the official -ml- command which is used internally by -dagumfit-. Not only is -dagumfit- not working anymore, but -ml- is executed where it should not -- and this can be a serious issue if -ml- does something unwanted to the data, for example.
This means that one would need to be careful and constantly check for name conflicts whenever strategy 3./4. is adopted. So advising strategy 1./2. (with do-files) seems in general safer, in particular for novice users who may not notice the perils.
But I'm be curious to read arguments for/against this claim. (The timing argument reported here is one!)
Philippe
> -----Original Message-----
> From: [email protected] [mailto:owner-
> [email protected]] On Behalf Of Gabi Huiber
> Sent: Thursday, September 25, 2008 10:48 PM
> To: [email protected]
> Subject: Re: st: RE: do-files as programs
>
> Martin, thank you for the pointers. For all they're worth, here are my
> findings:
>
> There are four basic ways, it seems to me, to organize a project in do-
> files:
>
> 1. You just make a list of instructions, executed one after another.
> If you need any of them executed more than once, put them inside a
> foreach loop. That's one do-file, and it will get as long and involved
> as the project demands it. It's the way we program when we are new to
> Stata.
>
> 2. You break up the problem into smaller do-files and have a master
> file call them call them as needed.
>
> Neither of the above makes use of the program feature. Do-files are
> read in as many times as they are used. 2. has certain advantages over
> 1. in ease of debugging and general readability, as short do-files are
> easier to pore over than long ones, but the project will have to rely
> on multiple inter-linked do-files instead of one. My guess is that the
> project doesn't have to be terribly complex before the advantages
> trump this drawback.
>
> 3. You make a do-file organized broadly as follows: in the first
> section you declare any programs you need, then in the second you
> invoke them as needed.
>
> 4. The programs at 3. above are saved as separate do-files, and a
> master file calls them in once with the "do file.do" command, then
> executes them as many times as needed by invoking their name. So, both
> 3. and 4. do make use of this "program" feature.
>
> My test project: I had to tabulate one dummy variable in 30 different
> files, then save the matcells in a master matrix. To make it last a
> little, I ran the same thing twice. I organized the project in the
> four ways above:
>
> 1. One do-file with no programs in it;
> 2. Four separate do-files organized as a master calling the other
> three twice each;
> 3. One do-file with three programs in it, each invoked twice and
> finally
> 4. Four do-files, where the master was calling in the other three
> once, then invoking them each twice.
>
> The results are as follows: 3 was fastest at 6 seconds, followed by 4
> at 9 seconds or so. 2 and 1 were about equally bad, some 11-12 seconds
> each.
>
> This suggests that declaring do-files as programs increases
> productivity. I hope this helps somebody.
>
> Gabi
<SNIP>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/