Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Pulling in files and data stored in a folder tree
From
"Lacy,Michael" <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Pulling in files and data stored in a folder tree
Date
Sat, 28 Jul 2012 13:45:28 +0000
"Ben Hoen" <[email protected]> wrote:
>Date: Fri, 27 Jul 2012 11:26:31 -0400
>Subject: st: Pulling in files and data stored in a folder tree
>
>Hi Statalisters,
>
>I have a set of ~ 200,000 records stored in one dataset (“master file”) each
>of which has a year and a county to which it applies, and a unique record
>id. Separately I have a large set of files that are stored by county (of
>which there are 20, so there are 20 county folders) and year (for each
>county there are 10 year folders – 2002 through 2011). In each year folder,
>there are 4 files that I want to pull data from (via 1:1 merge with the
>“master file” using the record id). There are roughly 10 variables I want
>to add to the master file from these 4 files, or approximately 2 to 3 from
>each file.
>
>So, the question is how I might write code that will go through each record
>in the master file, determine the year and the county, go through the folder
>tree to find the appropriate year in the appropriate county, and then merge
>with the four files “keeping” the data from the 10 variables?
>
>A few things to note: 1) the files I want to pull data from are column
>separated text files (i.e., I have not gone through the trouble of
>converting then to Stata files yet – but could…); and, 2) all of the files
>from which I want to pull data are named by county and year (e.g.,
><countyname>_<year>_<filename>) and these names match exactly with the
>county names and years stored in the master file.
>
Yes, you need to convert them first to Stata files.
I'd think about applying -levelsof- to your master file to get
the names of each of your county/year combination, and use
that to get into folder containing each that you need
to -insheet- into a Stata file. I'd put each of these into
a numbered list of tempfiles, and then merge each
one onto your master.
Something like this is what I was thinking of :
use master
levelsof county,local(counties)
levelsof year, local(years)
clear
cd "directory holding all the county-year files"
local basedirectory = "whatever"
local filecount = 0
// Put all the using files into Stata format,
// and save them in numbered temp files
foreach c of local counties {
foreach y of local years {
cd "`basedirectory\`c'\`y'" // whatever fits your file system
insheet using "first file of 4" .....
local filecount = `filecount' + 1
keep ...list of the variables of interest from file 1
tempfile temp`filecount'
save `temp`filecount''
....
....
local filecount = `filecount' + 1
insheet using "last file of 4" .....
local filecount = `filecount' + 1
keep ...list of the variables of interest from file 4
tempfile temp`filecount'
save `temp`filecount''
}
}
//
forval i = 1/`filecount' {
merge 1:1 county year using "`temp`i''"
tab1 _merge
keep if (_merge != 2)
drop _merge
}
Regards,
Mike Lacy
Dept. of Sociology
Colorado State Universty
Fort Collins CO U.S
Mike Lacy
Assoc. Prof./Dir. Grad. Studies
Dept. of Sociology
Colorado State University
Fort Collins CO 80523-1784
970.491.6721 (voice)
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/