Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: A problem while dealing with massive amount of data
From
Neil Shephard <[email protected]>
To
[email protected]
Subject
Re: st: A problem while dealing with massive amount of data
Date
Tue, 28 Jun 2011 09:45:26 +0100
On 28 June 2011 09:19, Mayank Mishra <[email protected]> wrote:
> Hello all,
>
> I have around two thousand .csv file in a folder which I need to clean
> and save as stata .dta file. For this I am running a loop in which
> -insheet- command takes up a file, then it gets cleaned and saved.
> There is a variable named "option_typ" which is used twice in the loop
> while cleaning. The problem is, in some files this variable is named
> as "optiontype". For those files, this do file gives an error and loop
> stops as it cannot find a variable named "option_typ". What makes it
> worse is that I don't know, which file have different variable name
> than used in the do file. So, please tell me what I can do for this
> situation.
You don't state which operating system your working on, but if your on
a *NIX based system you could easily use 'grep' to search all your
files and tell you just which files match (using the '-l' switch) or
those that don't match (using the '-L' switch), for example...
$ grep -l 'option_typ' *.csv > files_matching_option_typ.txt
$ grep -L 'option_typ' *.csv > files_not_matching_option_typ.txt
...will give you two files, whose names should be self-explanatory.
You can then use these lists to loop over specific files appropriately
depending on their contents.
If you're not on a *NIX system you could achieve this under M$-Windows
by installing the UNIX-like shell Cygwin (see http://x.cygwin.com/).
Neil
--
“Truth in science can be defined as the working hypothesis best suited
to open the way to the next better one.” - Konrad Lorenz
Email - [email protected]
Website - http://kimura.no-ip.org/
Photos - http://www.flickr.com/photos/slackline/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/