Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Singeling out datasets containing variable X in folder with many stata files
From
Stas Kolenikov <[email protected]>
To
[email protected]
Subject
Re: st: Singeling out datasets containing variable X in folder with many stata files
Date
Tue, 14 Jun 2011 15:16:20 -0500
On Tue, Jun 14, 2011 at 6:14 AM, Lukas Maximilian Rudolph
<[email protected]> wrote:
> Dear Statalisters,
>
> I have a folder with many stata-files that I am about to merge. Some of these contain information on household level, some on individual level. Of these, some are are in wide, some are in long form.
>
> I now want to identify all file names that contain a certain variable:
> I want to seperate all files with the variable "pidlink", the individual identifier.
> Within these, I want to identify all files that contain a variable ending with "*type" as just these are in long form.
>
> Then I would be able to construct one loop that reshapes all datasets in long form and then another loop that merges all files on individual and household level automatically without going through every single file.
>
> My thought would have been to try to save the files in different folders conditional on whether the respective variable is contained - but save is not combinable with "if".
local allfiles : dir . files *.dta
tokenize `allfiles'
local idlist
local longlist
while `"`1'"' != `""' {
use in 1 using `"`1'"'
unab allvars : *
if strpos("`allvars'", "pidlink") {
local idlist `"`idlist' `1'"'
if strpos("`allvars'", "type") {
local longlist `"`longlist' `1'"
}
}
macro shift
}
After that, the local `idlist' should contain all the files that have
-pidlink- variable, and the local `longlist', the subset of these that
have variables with names containing "type". It is likely that you
would want to parse `allvars' for the second filter using -regexp- so
that it is only activated when "type" is in the end of variable name.
But this should give you a starting point :)
--
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/