> -----Original Message-----
> From: Erik �. S�rensen [mailto:[email protected]]
> Sent: Friday, November 01, 2002 12:44 PM
> To: [email protected]
> Subject: Re: st: Relative efficiecy of merge
>
>
> On fredag, nov 1, 2002, at 12:22 America/Montreal, Hoetker,
> Glenn wrote:
> > One option I see is using merging A with B using the
> 'nokeep' option
> > and
> > saving the resultant dataset as B_reduced. Since dataset B
> is fairly
> > large, however, I want this to be as efficient as possible.
> Is merge
> > at
> > least close to the most efficient way to do this? If not,
> what might
> > be
> > more efficient?
>
> Have you tried and timed it? I merge files with 3-4 millions of
> observations regularly, and the cost of this is not so terrible. An
> example: it takes about 25 seconds to merge two datasets of 3
> millions
> on a unique identifier (one dataset had 2 variables, I merged
> in a set
> with 27 variables).
>
If you are timing different options, be sure to -set rmsg on- beforehand -- then stata reports how long each command takes.
My quick testing suggests that using the small dataset, and then merging in the big dataset (wth the -nokeep-) option, is faster than using the large dataset, merging the big one in, and then dropping non-matching observations. But would be easy enough for you to try it both ways yourself.
I'm not aware of a non-merge solution.
Nick Winter
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/