[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Relative efficiecy of merge

From	"Hoetker, Glenn" <[email protected]>
To	<[email protected]>
Subject	st: Relative efficiecy of merge
Date	Fri, 1 Nov 2002 11:22:54 -0600

Hi all.  I have a question about the efficiency of the 'merge' command.

I have two datasets, A and B.  A consists of about 500 distinct
observations of single variable, PATENT.  B consists of about 16 million
observations of two variables, one of which is CITED_PATENT.

I would like to keep only the observations of B in which CITED_PATENT
corresponds to one of the values of PATENT contained in A.  

As I work with this data, the contents of A will change from time to
time, so I want this to be easily repeatable.

One option I see is using merging A with B using the 'nokeep' option and
saving the resultant dataset as B_reduced.  Since dataset B is fairly
large, however, I want this to be as efficient as possible.  Is merge at
least close to the most efficient way to do this?  If not, what might be
more efficient?

Many thanks!

Glenn Hoetker
Assistant Professor of Strategy
College of Commerce & Business Administration
University of Illinois at Urbana-Champaign
(217) 265-4081
[email protected]


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Relative efficiecy of merge
  - From: "Erik �. S�rensen" <[email protected]>

Prev by Date: st: RE: problem with sskdlg ado
Next by Date: Re: st: Relative efficiecy of merge
Previous by thread: st: RE: problem with sskdlg ado
Next by thread: Re: st: Relative efficiecy of merge
Index(es):
- Date
- Thread