[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Collaborating in stata: how to share code and control version

From	Phil Schumm <[email protected]>
To	[email protected]
Subject	Re: st: Collaborating in stata: how to share code and control version
Date	Fri, 10 Apr 2009 10:50:06 -0500

On Apr 9, 2009, at 1:50 PM, Fabrice wrote:

I'm considering options to put structure into developing shared codein Stata. Currently, we already share code by using the ADOdirectory mechanism, that is very fine (i.e. putting into a sharedado directory files code that everyone else can use).
However, we start encountering problems whereby two people aremodifying the same file concurrently, or worse, one erasingsomeone's else work.
This is a typical case where a version control system is requiredand I wonder if anyone has anything to recommend. Note1: Theenvironment is windows. Note2: version control for stata developmentshould not be confused with "version control" in stata, which onlyalludes to the idea that Stata allows to enforce the version of thesystem under which the code is run.
I've looked on the web, and found hints that Emacs that would a)interface with Stata and b) provide some version control system.Yet, I wonder if that works smoothly and is worth the effort.
What is your experience on that matter? Has anyone found anythingelegant (i.e. simple) to manage this under windows?

As you may be aware, there are many version control systems (VCS) outthere (e.g., see http://en.wikipedia.org/wiki/List_of_revision_control_software), and some of the best ones are open source. The hot thing in versioncontrol right now is "distributed" version control, a well-knownexemplar of which is Git (http://git-scm.com/). The Linux kernel isdeveloped in Git (which, BTW, was initially designed by Linus Torvaldsfor this purpose), and GitHub (http://github.com/) is developing quitea following.

Personally, we use Subversion (http://subversion.tigris.org/), whichfollows more of a client-server model than a distributed model.However, there are a lot of things I like about Subversion, and itsuits our needs well. A lot of open-source software projects useSubversion, though some have now switched over to Git (and stillothers use one of the many other distributed VCSs). Subversion beganlife as a re-conceptualization of an earlier VCS called ConcurrentVersions System (CVS). Many years ago, CVS was just about the onlynon-commercial system available, and it was therefore ubiquitous. Infact, many people continue to use it today. CVS had quite a fewwarts, and was (IMO) painful to use. Subversion addressed theseissues, and, in contrast, is quite easy to use.

I don't want to start a debate over the merits of Subversion versusGit (or any other system); if you want to read more on this, Googlewill be happy to oblige. I will, however, share with you a couple ofthe reasons why Subversion works so well for us.

One reason is that it is very easy to set up and administer, and has avery small footprint. For example, Subversion comes pre-installed onOS X, is easy to install using the appropriate package installer underLinux, and a double-click installer is available for Windows. Afterthat, all you need is


svnadmin create foo

to create a new repository called foo. The repository is then a stand-alone directory, and can be moved around and backed up just as you dowith other files on your filesystem. Configuring behavior like emailnotifications on commits, restricting permissions, etc. are all veryeasy to do. And, the entire repository can be dumped to a file whichcan then be edited with a file editor, if necessary. Verystraightforward and intuitive. At the same time, it scales very well(i.e., large repositories and/or many users).

A second reason is that Subversion is easy to use. Of course, I saythis as someone who programs and used to use CVS. A better testamentto its ease-of-use is the fact that we have had a lot of success ingetting non-programmers (e.g., researchers, study coordinators, etc.)who have never used a VCS to use it. We typically spend about 1 hourgiving an introduction/tutorial, and, after that, they're ready togo. As a result, we are able to use Subversion to manage not only ourinternal code, but also the data manipulation, analyses, and evenadministrative documents for several large research projects. Thispermits the researchers on those projects to collaborate in ways thatthey could not do otherwise, and facilitates reproducibility.

Finally, although our programmers all use Unix/Linux, most of our"users" use Windows. Fortunately, there's a wonderful Windowsapplication (implemented as a Windows shell extension) calledTortoiseSVN (http://tortoisesvn.tigris.org/) which allows Windowsusers to access all the features of Subversion via menu itemsintegrated into the standard Windows contextual (i.e., right-click)menus. TortoiseSVN is probably the biggest reason for our success ingetting non-programmers to use version control.

The canonical reference on Subversion is the book written by Collins-Sussman, Fitzpatrick and Pilato, which is freely available over theweb (http://svnbook.red-bean.com/). Note, however, that like theStata reference manuals, all of the examples are illustrated at thecommand line. Thus, Windows users might prefer to start by readingthe documentation that comes with TortoiseSVN, which is excellent.

In sum, if you want, you should definitely spend some time researchinga few different systems. However, if the capabilities provided bySubversion are adequate for your needs, I'd heartily recommend it.And, in case you ever decide you want to switch to Git in the future,it's easy to convert your Subversion repositories.



-- Phil

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Collaborating in stata: how to share code and control version
  - From: Sergiy Radyakin <[email protected]>

References:
- st: Collaborating in stata: how to share code and control version
  - From: "Fabrice" <[email protected]>

Prev by Date: st: RE: Collaborating in stata: how to share code and control version
Next by Date: st: Ordering graphs when using by()
Previous by thread: st: RE: Collaborating in stata: how to share code and control version
Next by thread: Re: st: Collaborating in stata: how to share code and control version
Index(es):
- Date
- Thread