|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: making Stata read do-files
On Apr 16, 2008, at 2:14 PM, Gabi Huiber wrote:
The larger problem I was trying to solve was this: go through a
mess of directory paths and eventually find a load of do-files in
each of them, saved weekly -- sometimes with names such as
fileYYYYMMDD.do, and other times with names such as fileapr1608.do.
Then read each of those do-files line by line, but don't interpret
them. Instead, write each line to a .dta file as an observation in
a variable called cmd (as in command). Next to it, write the date
of the file that that line came from, in the format YYYYMMDD
(because it reads well to the human eye and sorts chronologically),
as the corresponding observation in a variable called date. Then
drop duplicates in terms of cmd.
The goal is twofold: I want to easily track changes made to the do-
files over time, and I want to use these dta files to make Stata
write its own do-files on the fly. If, for example, I want to
reconstitute the weekly do-file saved on 20071231, I just keep all
the observations in the dta file where date<=20071231. As time goes
by and people keep saving these weekly do-files, I just send Stata
to scrape the directories anew and re-assemble the master dta file.
I did not want to mess with the do-file names because other people
still use them and I wanted to do my work with as little disruption
to them as possible.
Of course had my client used some kind of proper revision control
system, like RCS in Unix, this effort would have been unnecessary.
How do the Statalisters deal with revision control? Is there a
Stata-specific good practices write-up on the matter? Might
somebody present one at the Chicago meeting?
What you describe above sounds like reinventing a Version Control
System (VCS), and, as you noted, it would make *much* more sense just
to use one of the many existing systems. There are now many
excellent and easy to use systems freely available (as well as many
commercial systems, of course).
To give you some background, in our shop we split our time between
data collection and management (including managing the public
releases of several large datasets) and statistical analyses. All of
this work is stored in a VCS (we currently use and strongly recommend
Subversion), and we absolutely could not function without it (at
least I would not want to consider such a scenario). A VCS is
usually pretty agnostic WRT what type of code you are storing in it,
but, as you might guess, we have developed several tricks to
facilitate the type of work we do, and to facilitate use of
Subversion with Stata (we also store non-Stata code in the repository).
You might also be interested to know that we have some experience in
training "non-technical" users to use our repositories (by non-
technical here I am referring primarily to data analysts who range in
their ability to use a package like Stata but are definitely not
computer programmers). I'm not going to suggest that this is always
easy (it isn't), but, under the right circumstances, we have evidence
that it can work.
I believe I brought this topic up in conversation at a users' meeting
a few years ago, and no one seemed interested. Thus, I don't know
whether an actual presentation on this would be appropriate at a
Stata users' meeting (plus, as I noted, most of the issues involved
are entirely general and not Stata-specific). But, as luck would
have it, I will be at the meeting in Chicago this summer, and would
be glad to sit down with you (and anyone else who's interested) and
share our thoughts and experiences WRT this.
-- Phil
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/