Re: st: repeat same commands over hundreds of files

From   Eric Booth <[email protected]>
To   "<[email protected]>" <[email protected]>
Subject   Re: st: repeat same commands over hundreds of files
Date   Tue, 2 Nov 2010 20:22:47 +0000


Hi Tom:

The best approach probably depends on how your file names are sequenced and how your folders/files are organized, but programs like -fs- (from SSC) and others are useful for this type of work.  Here's two approaches:

assuming you've got files named sequentially like this:


You could use a -forvalues- loop like:

forval n = 1972/1981 {
insheet using "/Users/tbrunell/MPG/CT/mpg_09_CTC`n'_`n'_EDCD11_10_JH22.csv"
drop in L /*this drops file notation at the bottom*/
gen demper=dem/(dem+rep)
gen demwin=.
replace demwin=1 if demper>.5 & demper~=.
replace demwin=0 if demper<.5
sort rkey
gen overalldemper=overalldem/(overalldem+overallrep)
collapse (count) numberofseats=demper (sum) demwin (mean) year demper overalldemper (p50) median=demper,by(rkey)
gen percentdemdist=demwin/numberofseats

**create a macro for the decade**
local save
if inrange(`n', 1970, 1979) local save 1970
if inrange(`n', 1980, 1989) local save 1980 

save "/Users/tbrunell//MPG/CT/CTC`save's", replace

Note the use of the local macros to create the decade for the -save- filename.

Another approach is to just find all the .csv files in your folder (or alternatively this could be done to find all the folders of interest and all the .csv files in all the folders of interest) using the macro extended functions (see -help extended_fcn-)  and run the code on all of them , e.g., 

global files:dir "<folder path>" files "*.csv", respectcase
token `"$files"'
di in yellow `"$files"'

while "`1'" != "" {
	insheet using "/Users/tbrunell/MPG/CT/`1'.csv"
	save "/Users/tbrunell//MPG/CT/`1'.dta", replace

macro shift

- Eric
Eric A. Booth
Public Policy Research Institute
Texas A&M University
[email protected]

P.S.  Say "Hi" to Dave Smith for me if he's still around there.

On Nov 2, 2010, at 2:57 PM, tbrunell wrote:

> I am doing some simple analysis on election data that spans all the states and several decades.
> So I have hundreds of files that I want to do the same relatively simple analysis on (I have an example below).
> At first I started writing .do files for each state/year and the only things I changed were the 
> 1) file name for the insheet command
> 2) the name and location of the collapsed file at the end.
> However, when I wanted to add an additional command this meant opening hundreds of separate .do files, making a change, resaving the file.  It is not the end of the world, but I would prefer to set up the commands and then, somehow, tell stata to run the commands separately for each specified file and then save the resulting file with some new name.
> The techs at Stata recommended using macros for file names and the foreach command.  But that doesn't solve my filename and output file problem.
> Any recommendations would be much appreciated.
> Tom Brunell
> Professor of Political Science
> University of Texas at Dallas
> _____________________________
> clear
> insheet using "/Users/tbrunell/MPG/CT/mpg_09_CTC1972_1972_EDCD11_10_JH22.csv"
> drop in L /*this drops file notation at the bottom*/
> compress
> gen demper=dem/(dem+rep)
> gen demwin=.
> replace demwin=1 if demper>.5 & demper~=.
> replace demwin=0 if demper<.5
> sort rkey
> gen overalldemper=overalldem/(overalldem+overallrep)
> *here overalldemper will be total votes percentage, demper is "normalized" vote - averaged across districts
> collapse (count) numberofseats=demper (sum) demwin (mean) year demper overalldemper (p50) median=demper,by(rkey)
> gen percentdemdist=demwin/numberofseats
> save "/Users/tbrunell//MPG/CT/CTC1970s", replace
> *
