Hi all.
I am working with some large dataset (at least by my experience). One
is about 2.9 million records of 23 variables each. The other is 16
million records of two variables each. As I try to manipulate the data,
Stata either crawls or crashes. Of course, I only have 256M of memory,
but before I invest in more memory, I wanted to see if experienced folks
thought Stata would be a good tool for such large datasets, assuming my
computer were had more memory.
Most of my manipulations are pretty simple, mostly just selecting
records meeting certain criteria and saving them to a separate file. The
most challenging task, I suspect, will be using Jeroen Weesie's
wonderful mmerge program to do some matching across the files.
If I added sufficient memory (say increasing my RAM to a 1 gig, since
the datasets take up about 200M each), am I likely to find Stata
satisfactory for this task? My computer has a fast Pentium 3, to the
degree that matters. Or, should I look towards a dedicated database
management package, like MySQL or, Heaven forbid, SAS (PROC SQL could do
a lot of what I want easily, but I stopped using SAS when I found mmerge
could do most of what I used PROC SQL for).
Any advice would be much appreciated!!
Glenn
Glenn Hoetker
Assistant Professor of Strategy
Department of Business Administration
University of Illinois at Urbana-Champaign
217-355-4891
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/