2009/11/19 Joachim Landström <[email protected]>:
> What is Good Stata Programming Practice?
Since there's exactly one book on Stata programming
(http://www.stata.com/bookstore/isp.html), I think we all should blame
Kit Baum for failing to bring up perfect Stata code writing habits in
this community!
To Joachim's questions about classes and all -- off the top of my
head, I can name two people who used Stata classes and OO features --
Vince Wiggins (and his team) who wrote Stata graphical engine, and
Sergiy Radyakin (and his team) who wrote some crazy data management
stuff at WorldBank. As Nick Cox said, most of statistical programming
is really straightforward in terms of computer science concepts. The
most complicated algorithms are loops (and R, for instance, still
struggles to have efficient loops, by the way), and they mostly arise
in (1) data structures that have some repetition, (2) iterative
maximization (taken care of by Stata Corp with -ml- and -mata:
optimize-), and (3) parameter estimates display. And then the most
complicated data structures that are commonly used are panel data.
There might be some sort of beneficial data structures to describe a
neighborhood of an observation -- they would come extremely handy in
non-parametric smoothing, spatial problems, multivariate clustering or
matching estimators. In some of these areas, Stata does lag behind R
(where, I believe, the brute force of sifting through the data set to
find the neighbors is still used). Can these kind of data structure
problems solved with objects? Are there special data structures
invented for those purposes that could be coded in Stata? I don't
really know, but I doubt there has been computer science research
behind these topics.
Relative simplicity of Stata programming means "everybody can cook",
qutoing Auguste Gusteau from "Ratatouille" :)). Is this do-it-yourself
openness of Stata a double-edged sword? It is easy to write your own
programs, so is it too easy to write programs in bad style?
Sometimes, looking at other people's Stata code, I have some mental
comments like "This is easier achieved via local macro manipulation"
or "If you really want this to work with the rest of Stata, you need
to ereturn this properly" or "What the heck does this underscore
command do?" (with Stata Corp. code -- it is usually transparent to
read until you hit something undocumented "because we did not think
anybody would be interested in knowing how it works"). Other times, I
look at other people's code and think, "How come I don't know this
command after 10+ years of Stata experience?" (the answer often is, it
came out in the new release, and I did not bother to read -help
whatsnew- carefully enough :)) or "Thank God they broke down this
parsing problem into five lines". Well, if it were possible to have
peer review of the code submitted to SSC or Stata Journal, just like
there is peer review of research submitted to academic journals, we
would have better code floating around; but there are simply no human
resources to do that. I believe most people would stop at where the
code produces apparently correct results for their own research
problems most of the time (remember that Stata is the software for
professionals who need to write research papers and/or project
reports; they use it to solve their problems rather than develop some
cute packages). In all likelihood, this code can be improved in terms
of stability and usability. Some people are interested in producing
re-usable code that would help others (altruism of this community is
still a badly under-studied area; see however
http://ideas.repec.org/p/boc/dcon09/6.html), and some people are
better at programming because of their formal training or extensive
earlier experience, so they produce better code. But requiring
everybody to write code in perfect style is as unrealistic as asking
everybody to exercise for a couple of hours every day -- yes, there
are obvious benefits to both, but few people can really get to do that
given that they have other things to accomplish.
--
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/