|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Collaborating in stata: how to share code and control version
From |
Phil Schumm <[email protected]> |
To |
[email protected] |
Subject |
Re: st: Collaborating in stata: how to share code and control version |
Date |
Fri, 10 Apr 2009 10:50:06 -0500 |
On Apr 9, 2009, at 1:50 PM, Fabrice wrote:
I'm considering options to put structure into developing shared code
in Stata. Currently, we already share code by using the ADO
directory mechanism, that is very fine (i.e. putting into a shared
ado directory files code that everyone else can use).
However, we start encountering problems whereby two people are
modifying the same file concurrently, or worse, one erasing
someone's else work.
This is a typical case where a version control system is required
and I wonder if anyone has anything to recommend. Note1: The
environment is windows. Note2: version control for stata development
should not be confused with "version control" in stata, which only
alludes to the idea that Stata allows to enforce the version of the
system under which the code is run.
I've looked on the web, and found hints that Emacs that would a)
interface with Stata and b) provide some version control system.
Yet, I wonder if that works smoothly and is worth the effort.
What is your experience on that matter? Has anyone found anything
elegant (i.e. simple) to manage this under windows?
As you may be aware, there are many version control systems (VCS) out
there (e.g., see http://en.wikipedia.org/wiki/List_of_revision_control_software)
, and some of the best ones are open source. The hot thing in version
control right now is "distributed" version control, a well-known
exemplar of which is Git (http://git-scm.com/). The Linux kernel is
developed in Git (which, BTW, was initially designed by Linus Torvalds
for this purpose), and GitHub (http://github.com/) is developing quite
a following.
Personally, we use Subversion (http://subversion.tigris.org/), which
follows more of a client-server model than a distributed model.
However, there are a lot of things I like about Subversion, and it
suits our needs well. A lot of open-source software projects use
Subversion, though some have now switched over to Git (and still
others use one of the many other distributed VCSs). Subversion began
life as a re-conceptualization of an earlier VCS called Concurrent
Versions System (CVS). Many years ago, CVS was just about the only
non-commercial system available, and it was therefore ubiquitous. In
fact, many people continue to use it today. CVS had quite a few
warts, and was (IMO) painful to use. Subversion addressed these
issues, and, in contrast, is quite easy to use.
I don't want to start a debate over the merits of Subversion versus
Git (or any other system); if you want to read more on this, Google
will be happy to oblige. I will, however, share with you a couple of
the reasons why Subversion works so well for us.
One reason is that it is very easy to set up and administer, and has a
very small footprint. For example, Subversion comes pre-installed on
OS X, is easy to install using the appropriate package installer under
Linux, and a double-click installer is available for Windows. After
that, all you need is
svnadmin create foo
to create a new repository called foo. The repository is then a stand-
alone directory, and can be moved around and backed up just as you do
with other files on your filesystem. Configuring behavior like email
notifications on commits, restricting permissions, etc. are all very
easy to do. And, the entire repository can be dumped to a file which
can then be edited with a file editor, if necessary. Very
straightforward and intuitive. At the same time, it scales very well
(i.e., large repositories and/or many users).
A second reason is that Subversion is easy to use. Of course, I say
this as someone who programs and used to use CVS. A better testament
to its ease-of-use is the fact that we have had a lot of success in
getting non-programmers (e.g., researchers, study coordinators, etc.)
who have never used a VCS to use it. We typically spend about 1 hour
giving an introduction/tutorial, and, after that, they're ready to
go. As a result, we are able to use Subversion to manage not only our
internal code, but also the data manipulation, analyses, and even
administrative documents for several large research projects. This
permits the researchers on those projects to collaborate in ways that
they could not do otherwise, and facilitates reproducibility.
Finally, although our programmers all use Unix/Linux, most of our
"users" use Windows. Fortunately, there's a wonderful Windows
application (implemented as a Windows shell extension) called
TortoiseSVN (http://tortoisesvn.tigris.org/) which allows Windows
users to access all the features of Subversion via menu items
integrated into the standard Windows contextual (i.e., right-click)
menus. TortoiseSVN is probably the biggest reason for our success in
getting non-programmers to use version control.
The canonical reference on Subversion is the book written by Collins-
Sussman, Fitzpatrick and Pilato, which is freely available over the
web (http://svnbook.red-bean.com/). Note, however, that like the
Stata reference manuals, all of the examples are illustrated at the
command line. Thus, Windows users might prefer to start by reading
the documentation that comes with TortoiseSVN, which is excellent.
In sum, if you want, you should definitely spend some time researching
a few different systems. However, if the capabilities provided by
Subversion are adequate for your needs, I'd heartily recommend it.
And, in case you ever decide you want to switch to Git in the future,
it's easy to convert your Subversion repositories.
-- Phil
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/