On 9/18/08, Man Jia <[email protected]> wrote:
> I was wondering if anyone could share any tips of managing many do
> files for one research project. Thanks for your help!
> ----------------------------------------------
> (1) How to make it easy to find some specific work in several do files?
>
> Now I have to open most of them to see if the file has the part I'm
> looking for. The thing is, it is kind of hard to remember clearly the
> content of each file after even two or three days not working with
> them. I tried to write outline in the beginning of each file, but I
> have to open each of them to get the outline.
> ----------------------------------------
well first of all the files should be clearly separated by projects
into different directories. Then a number of suggestions was given,
most common one (I think stemming from some suggestions by Bill Gould
that I have first seen 10 years ago or so) is to have separate files
for data preparations, analysis, results formatting and output, and a
master file to run all of them. I also try to make sure that my
filenames are streamlined: if my project is called TSP (this special
project), then I might have files
tsp-master.do
tsp-working-data.do : creates tsp-working.dta from all the raw data files
tsp-working-probit.do: does some analysis of tsp-working.dta
tsp-working-heckman.do: more analysis of a different kind on tsp-working.dta
tsp-working-tables.do: produce the tables (if you work in Word, you
would probably shrug your shoulders, but if you work with LaTeX, you
will know what I mean :))
tsp-working-graphs.do: produce the cute pictures to go into the paper
I think explanations of how to do this in most Stata-ish and effective
way were provided in some NetCourses, so consider enrolling in those,
they are a great boost to your Stata programming skills.
> (2) detailed comments in do files
> For me it's useful to have detailed comments to explain what I'm
> doing in most parts in a do file. Those details include what the
> commands are doing, anything I should be careful with in future,
> sources I get the ideas, why the alternative ways are not good, summary
> of the results..... But a do file has limitation of number of lines in
> it. So, Writing a lot of comments means I have to create many do files.
Hm. I've always thought that do-files are essentially endless: Stata
reads line by line, and it does not care whether you have 2 or 2000
lines. At any rate, the longest do-files that I ever had to deal with
were of two kinds: 1. reading large data sets, setting all the labels
and such (e.g., NHIS do-files to input the data are some 100k in size
and some 2000 lines), and 2. running several hundreds of regressions
with various modifications of the model and multiple dependent
variables (I did not do that, but I've seen the files...) -- again
those had hundreds if not thousands of lines, and with some use of
macros, cycles and basic -programs- for repetetive pieces of five
lines of code (select the variables -- run estimation -- run some
tests -- outreg them or estimates store them) those could be reduced
to a hundred lines. My do-files are rarely longer than a couple
hundred lines... but most of my recent work are simulations which
usually break down into chunks quite nicely: a file to cycle over the
simulations settings, a file for repetitions within the given set of
parameters, a file to create the data set, a file to run my estimation
commands and store the results (that's usually the longest one).
--
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/