I work in a research group that is about to build a multiyear data
structure to support research. Historically, data management strategies
have resulted in multiple derivative versions of key variables and
concepts - for example, imagine a continuous variable that is then
variously categorized, variously trimmed, etc.. A whole cluster of
derived variables results, but the underlying do files are not uniformly
preserved. This undermines the integrity of the resulting derivative
datasets. I think this is a pretty typical story.
Clearly we are not alone in this challenge. In SAS I might generate a
root set of variables, and then do my trims and recoding through a
variety of format statements which I could electively attach to the
core variables. However, that concept doesn't quite fit STATA.
Therefore, I'd invite suggestions on how you are managing this sort of
data integrity/documentation problem within STATA environments.
Thanks,
Rob
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/