Nick,
I agree on much what you say. Indeed, I am a user who have no formal
programming training, but have learnt the trade by myself (originally in
SAS) and so know about the strategy of mimicking as a way to learn
programming. And I know that when I began to write code in SAS I did not
bother about naming conventions, breaking up a program into subroutines in a
clear and consistent manner, et cetera. This lack of good programming style
meant that I had to spend many hours just deciphering what I previously had
been writing. Looking as some ado files I get the feeling that also code
writers in the Stata community may use a similar approach.
So those who are new to programming and who uses Stata as a tool to learn
programming will benefit greatly if they already from start apply a certain
style and stick to it. It becomes even easier to learn programming if that
style adheres to much said in your reference below. Many such users may not
even think of the usefulness of a "good" style from the beginning and so
your article may go unnoticed among those who really need it most. Thus
style must not just be written about but must be practiced too. Note that
this is a general comment and hence not aimed at Nick's programming
practice. :-)
I also believe that if more people writing ado files began to look more at
style guides in e.g., C++, we may begin to see more programs appearing that
have "better" style and so is easier to maintain by the originators. As a
side kick this will make it easier for those who aspire to learn programming
trough a mimicking behavior.
In conclusion, I believe that those who write programs today for Stata will
benefit greatly from moving towards a more structured programming style
similar to that in e.g. C++, and by doing so a trend may begin trough
newbies like me who learns trough copying Stata programming practice as seen
in published ado files. In the end I believe everyone will be better off
from such a behavior.
Joachim
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Nick Cox
Sent: Thursday, November 19, 2009 1:02 PM
To: [email protected]
Subject: st: RE: What is good programming practice in Stata?
Joachim asked a (good!) question but then answered it ironically and
delivered observations on what is _common_ programming practice. That's fine
by me but we can't have a very fruitful discussion on this with
distinguishing _good_ and _common_.
I'd say style here rather than practice. Good programming practice certainly
includes good structure of programs and good strategy in planning programs
and designing syntax. Joachim's focus is more on the small stuff.
He seems to have missed one piece of pontification:
SJ-5-4 pr0018 . . . . . . . . . . . . Suggestions on Stata programming
style
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J.
Cox
Q4/05 SJ 5(4):560--566 (no
commands)
suggestions for good Stata programming style
-- but that doesn't touch on many of the details in his post.
We could talk about good writing style in English, Joachim could talk about
good writing style in Swedish, and we might still agree that really good
style was quite rare. So sampling programs written by Stata programmers,
many of whom would probably regard themselves as definitely still learners,
doesn't tell you much about good style, any more than sampling prose written
by university students in their native language tells you much about good
style.
Also, please, if you want to talk about Stata programming you must use Stata
terminology. I know what a parameter is in statistics, but not what it is as
part of any Stata language.
I think it's pretty clear, and spelled out in every history of Stata, that
the major past influence on Stata is the Unix/C culture. The major present
influence on Stata I would assert to be Stata, in that most programmers are
broadly imitative of what they read. I've no data, but it seems that for
many Stata is their first and only serious programming language. A common
question on starting Mata is: where are the local macros?
The overarching question is how far do we write in order to be read? I can
think of four answers:
1. Programmers should want to be able to read their own code easily if only
when they modify it at later dates. This I take to be obvious to all who
code but by itself it implies only that you should follow personal style
rules.
2. Programmers may write in collaboration with others. In Stata, most
programs are really rather short by general computing standards and the
modal number of programmers per program is easily one, so this doesn't often
bite.
3. Programmers may expect to write a program but have others take it over at
some later date. This is standard in many institutions but not very common
-- indeed a sensitive subject -- in Stata. Death, apparent abandonment or
religious conversion (e.g. the original programmer becoming an adept of some
other software) would be the main reasons for taking on someone else's code.
4. Programmers might expect to be read by users. My own haphazard sampling
suggests that most users have absolutely no intention of reading code even
when it would be highly instructive. Personally, I write to be used rather
than read, but I am not embarrassed at the thought of being read.
On your specifics:
#1 Do not use either Pascal casing (MyVar) nor Camel casing (myVar) for
variables and parameters, just stick to small caps.
No rule against either, but I'd agree both are very uncommon styles in
Stata.
#2 Do not use meaningful and descriptive words to name variables
I'd agree that good names are a good idea. I don't think that's subversive.
#3 Use as much of single character variables as you like and surely do not
comment on what they are
I'd say single character names are the norm for loop indexes and often when
statistical conventions are being echoed. No statistical programmer I can
imagine would regard it as obscure to call a response variable -y-. Putting
-response-, -outcome- or whatever just bloats your code. But I'd often write
something like -yvar- too.
I'd agree there's a culture of commenting only sparsely in Stata. I think
you'd find a chorus against over-commenting as just adding clutter. If you
have to lean on the comments, you don't understand enough Stata to be able
to understand the code! It may have grown out of the early days of Stata
when developers could just walk a short distance and ask for explanation of
tricky code. To my mind, there's an inevitable sameness about many Stata
programs (syntax checking/data checking/preliminary calculation/main
calculation/display of results/return saved results) that makes comment
unnecessary unless you were writing code to be read in a course on Stata
programming, in which you are on your best behaviour and write artificially.
#4 Do not bother to use method names that dissimilar to existing functions
(i.e., display versus Display)
I am not sure of your point here, but I'd say that case distinctions are
rarely a good way to make code clear unless you have some personal style
rules that you stick to it absolutely. I'll sometimes use a case distinction
very briefly to do something.
#5 Do not separate logical groups of code
It's indeed common to get long unbroken code segments. There is no good
defence except that it may not matter much.
#6 There is no consensus about numbers of blank lines between different
methods in an ado-file.
That's probably correct. I don't like more than single lines. Double lines
or more just lengthen program files to no good more purpose.
#7 Do not use single spaces before and after operators and brackets.
This is discussed in my 2005 paper. I'm a strong advocate of adding spaces
for clarity but keeping lines short is another conflicting aim.
#8 By all means use as much of abbreviations as possible
See above.
Nick
[email protected]
Joachim Landström
I have been browsing around on Internet trying to find any suggestions about
good programming practice in Stata and have failed to do so. Thus I pose
this question.
When I have a look at ados, it does seem to me that good programming
practice in Stata amounts to:
#1 Do not use either Pascal casing (MyVar) nor Camel casing (myVar) for
variables and parameters, just stick to small caps.
#2 Do not use meaningful and descriptive words to name variables
#3 Use as much of single character variables as you like and surely do not
comment on what they are
#4 Do not bother to use method names that dissimilar to existing functions
(i.e., display versus Display)
#5 Do not separate logical groups of code
#6 There is no consensus about numbers of blank lines between different
methods in an ado-file.
#7 Do no use single spaces before and after operators and brackets.
#8 By all means use as much of abbreviations as possible
.
.
.
Well I could continue but the more I write I feel that it rather becomes a
list of bad programming practice.
If we have a look at good programming "code of conduct" in e.g., C++ or Java
we see extensive use of different types of casing separating classes,
methods, variables and parameters. Variables are given descriptive words,
commenting is sparse and largely unnecessary since descriptive words are
used and abbreviations are avoided as are single character variables. Single
spaces are used both before and after operators and brackets.
I could go on on this issue but being rather fresh as a Stata user my
empirical sample of ados may be biased and that is why I raise this
question.
What is Good Stata Programming Practice?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/