Sascha,
I have a recommendation that I wouldn't usually make. I have been
recently doing work with matched employer-employee data with over 30
million obs, so we have been running into the same problem as you. SAS
is much better for large dataset merges than Stata. In particular, proc
SQL is remarkably fast at doing these types of merges (likely because
SQL is written with this type of operation in mind).
Well there is was, likely the last time I will recommend SAS over Stata.
Cheers,
Steve
-----Original Message-----
From: [email protected]
[mailto:[email protected]]On Behalf Of Sascha O.
Becker
Sent: Thursday, August 19, 2004 6:59 PM
To: [email protected]
Subject: Re: st: selecting obs while reading in huge data set
Dear Daniel,
thanks for your reply!
You suggested:
****
Perhaps you can read the employee and firm ID only?
.insheet empid firmid using mydata
This is only 1/5th the variables, so it might fit in your computer
memory.
Then merge the result with the firm dataset, keeping only matched
records, then merge again with employee dataset, keeping only matched
records.
****
This last step is actually identical to the original problem. "The
employee dataset" is the full dataset with all variables. In order to
merge this to anything, it needs to be in memory at least once, and this
is exactly the problem.
There seems to be no way round some kind of looping, either over
observations, or over subsets of variables that I would merge against
the firm data set and then append/merge those sub-datasets.
Cheers, Sascha
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/