[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: selecting obs while reading in huge data set

From	Daniel Feenberg <[email protected]>
To	[email protected]
Subject	Re: st: selecting obs while reading in huge data set
Date	Wed, 18 Aug 2004 11:14:55 -0400 (EDT)

On Wed, 18 Aug 2004, Sascha O. Becker wrote:

> Dear stata users,
> 
> I have a huge data set A (2 GB in ASCII) with 40 mio. observations 
> (workers) but only 10 variables. I have another data set B containing 
> information on (a sub-set of) employers and want to select only workers 
> from data set A that are employed in firms in data set B (firm IDs are 
> one variable in data set A).
> 

Perhaps you can read the employee and firm ID only? 

   .insheet empid firmid using mydata

This is only 1/5th the variables, so it might fit in your computer memory.
Then merge the result with the firm dataset, keeping only matched records,
then merge again with employee dataset, keeping only matched records.

Alternatively, if you used "infile" instead of "insheet" you could
use the "if exp" clause to input only employees at eligible firms. But I
suppose there might be some limit on the complexity of the "exp" that
would limit the number of firms you could list. [Actually, insheet
might support an if clause, but it isn't mentioned in the help file, as
it is with the infile command].

 .infile varlist using mydata if firmid==123 | firmid==456 ...

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: selecting obs while reading in huge data set
  - From: "Sascha O. Becker" <[email protected]>

Prev by Date: st: question on GLLAMM
Next by Date: st: How to perform Hausman test for random effects specification with survey data
Previous by thread: st: selecting obs while reading in huge data set
Next by thread: Re: st: selecting obs while reading in huge data set
Index(es):
- Date
- Thread