| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: label / macro problem
Jeph--
Trying to conserve memory and maximize speed,
I suggest something like:
foreach F of numlist 1/84 {
use mystring using file`F', clear
bys mystr: drop if _n>1
save temp`F', replace
}
use temp1, clear
foreach F of numlist 2/84 {
append using temp`F', clear
erase temp`F'
}
erase temp1
bys mystring: drop if _n>1
encode mystring, gen(myint)
lab save myint using myint.do
foreach F of numlist 1/84 {
use mystring using file`F', clear
do myint
encode mystring, gen(myint) label(myint)
save newfile`F', replace
}
If you want to see what is going on with your local macros, just
-display- them at various points in your program, but the above skips
the use of long-winded macros.
If you acquire a new file, you can always:
use mystring using file85, clear
do myint
encode mystring, gen(myint) label(myint)
lab save myint using myint2.do
On 8/11/06, Jeph Herrin <[email protected]> wrote:
I'm using 9.2, latest update.
My programming problem is to combine a large number of large files;
approximately 84 files of 500k obs each. I only need three variables
from these files, but one of them, -mystring- is str64, which means that
as is, I can't combine these files via appending because my RAM (4GB)
runs out.
However, -mystring- only takes about 5500k different values. So
the solution I am using is to open each file, encode(mystring), save
the label, and then append all prior opened files. The values of
-mystring- are not constant over all the files - new values are added
over time, so I have to update the value labels each time I add a file.
My code looks like this :
u file1, clear
encode mystring, gen(myint)
local myintlab : value label myint
save temp, replace
foreach F of numlist 2/84 {
u file`F', clear
keep ID mystring
encode mystring, gen(myint) label("`myintlab'")
local myintlab : value label myint
append using temp
save temp, replace
}
This seems to work fine until a point. But after about 30 files,
*something* runs out of space, and the value label ceases to be
updated with new values; -myint- simply holds integers with no
corresponding labels. Now, I understand that 64k value label
values should be allowed, so I don't see a problem there. And
-myintlab- is just a macro holding the name of the set of value
labels. So what else could be going wrong? Or, is there another
way to do this?
NB: The close reader will note that I mention 3 variables in the preamble
but only have two in my code fragment. In fact, I *also* encode a
second string variable; it takes many fewer values, however, and
turns out fine in the end.
In particular, I would appreciate any tips on how to debug what
is happening.
cheers,
Jeph
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/