A few days ago a posted a reply on this subject, but it didn't appear on the
list. Here it goes again.
A possible solution would be doing something like:
After converting the excel files to dta using StatTransfer you can run this
code:
clear
cd "c:\folder_with_dta_files"
fs *.dta
local j "0"
foreach f in `r(files)' {
if `j' == 0 use `f'
else append using `f'
local ++j
}
save mydata.dta,replace
It needs to have the -ps- package installed.
Best,
Nuno
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Joseph Coveney
Sent: sexta-feira, 29 de Dezembro de 2006 5:38
To: Statalist
Subject: RE: st: Appending files using stat transfer
Rajesh Tharyan has some 3000 Excel files to append and convert to Stata file
format. As others have mentioned, Stat/Transfer can convert individual
Excel files to individual Stata files and then Stata's -append- command can
append the Stata files, but Rajesh has had trouble in the past running out
of memory using this approach and so seeks an alternative that has
Stat/Transfer do the appending beforehand.
Stat/Transfer *can* append multiple Excel files to a single Access table.
So, one approach is to have Stat/Transfer (i) create a temporary
intermediate Access file with a single table, (ii) append the various Excel
files into it, and then (iii) convert the Access file into a single Stata
file. One hitch with this approach is that you cannot use the wildcard
feature to append multiple source files into a single target file. That is,
COPY *.xls temporary.mdb /Tdummy
won't work. Despite the specification of a single target file, it will
create multiple Access files, each with the name of the corresponding Excel
file. In order to use the append feature of Stat/Transfer, you need to
specify the names of the source files individually.
Typing the names of 3000 Excel files into the Stat/Transfer command
processor is uninviting, so it is best to let Stata do this tedious aspect
for you using a loop: you can have Stata create a Stat/Transfer command
file, as illustrated below. Then, just run the Stat/Transfer command file
from the operating system. (Executing the Stat/Transfer command file as a
shell command from within Stata is noticeably slower than from the operating
system directly. I suppose that this is because of a lower priority granted
spawned processes.)
A variation on this is to have Stat/Transfer create the temporary
intermediate Access file from the first of the multiple Excel source files,
and then to use Stata's -odbc insert- command to do the appending of the
remaining source files, again using a loop. Then, just use the -odbc load-
command to directly read into Stata the rows and columns of the Access file
as needed. This avoids the memory crunch in that no more than one Excel
file is read into memory at any one time. (You might even be able to use
-odbc exec- with a SELECT INTO SQL statement and have the transfer done
directly between the Excel file and the Access table without actually
creating a Stata dataset in memory at all.) In my experience, however,
Stat/Transfer is much faster than -odbc- commands for this kind of thing.
With 3000 files, this difference in speed is liable to be important to you.
Another variation is to loop within Stata using -odbc- to directly read-in
only those columns and rows of the multiple Excel files desired, and do the
appending of the reduced datasets all within the same loop. Again, the
Stat/Transfer approach would probably be faster than this -odbc- approach.
Your ability to use Stata's looping commands to create the list of Excel
file names in the Stat/Transfer command file depends upon the naming
convention you employ for the 3000 Excel files. With numerically
distinguished Excel file names (as in the illustration below), it is
easiest. With alphabetically distinguished file names, you'll need to use
-fs *.xls-, and with 3000 files, you might run into the limit on macro
length. A task of this repetitive nature, a little discipline in file
naming conventions can pay large dividends. Along these lines, adopting
data management standards that eschew using Excel will also help. (Often
you're handed what your data source deems fit to hand you, however.)
In the example below, a Stata program is used to create a clean
Stat/Transfer command file. In the illustration, ten Excel files to be
appended are named "file1.xls", "file2.xls", . . ., and use a -forvalues-
loop. Modify this to suit your circumstances. The command file is named
"xls2dtaBulkXfer.stc", which can be invoked--either as a shell command from
within Stata (not recommended) or from the operating system--to do the
appending-transferring from multiple Excel files into a single Stata file
named "MyCombinedSpreadsheets.dta". As illustrated below, the Stat/Transfer
command file erases the temporary intermediate Access file, but this can be
omitted. (Be sure that there isn't such an Access file with such a table in
existence already in the directory.) The Stat/Transfer command file also
calls for logging the transfer session. This is recommended, but can also
be omitted. One of the important features is the "SET DB-TABLE-APPEND Y"
line, which has Stat/Transfer append the successive files to the Access
table instead of overwriting the table with each successive Excel file.
Joseph Coveney
capture program drop generate_stc
program define generate_stc
capture log close
quietly log using xls2dtaBulkXfer.stc, text replace
display "SET DB-TABLE-APPEND Y"
display "LOG USING xls2dtaBulkXfer.log"
forvalues i = 1/10 {
display "COPY file`i'.xls dummy.mdb /Tdummy /OD"
}
display "COPY dummy.mdb MyCombinedSpreadsheets.dta"
display "!DEL dummy.mdb"
display "QUIT"
quietly log close
end
generate_stc
exit
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/