Hi all. I have a question about the efficiency of the 'merge' command.
I have two datasets, A and B. A consists of about 500 distinct
observations of single variable, PATENT. B consists of about 16 million
observations of two variables, one of which is CITED_PATENT.
I would like to keep only the observations of B in which CITED_PATENT
corresponds to one of the values of PATENT contained in A.
As I work with this data, the contents of A will change from time to
time, so I want this to be easily repeatable.
One option I see is using merging A with B using the 'nokeep' option and
saving the resultant dataset as B_reduced. Since dataset B is fairly
large, however, I want this to be as efficient as possible. Is merge at
least close to the most efficient way to do this? If not, what might be
more efficient?
Many thanks!
Glenn Hoetker
Assistant Professor of Strategy
College of Commerce & Business Administration
University of Illinois at Urbana-Champaign
(217) 265-4081
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/