Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: nearmrg for strings (titles)
From
Daniel Feenberg <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: nearmrg for strings (titles)
Date
Tue, 30 Aug 2011 08:13:11 -0400 (EDT)
On Tue, 30 Aug 2011, Hoecher, Michaela (0613xxx) wrote:
Hello!
I would like to merge two datasets (variables: title, date, publisher).
The problems is, that strings (tiltes of a book), that are not absolutely the same sould be merged/matched.
- Does it make sense to use nearmrg for this?
- In which way are strings merged/matched?
- What would you recommend me?
Some time ago I wrote a program to help a clerical do this rapidly. The
program finds up to 5 likely matches, and lets the operator select the
best match. I used it once to go through a few thousand journal article
matches but it hasn't been used since. There is documentation at:
http://www.nber.org/imatch
and I would be interested in having a few more users. It is interactive,
but it isn't a GUI program - it runs from the command line and the
operator makes selections with the keyboard.
Note that most commercial code to do matching is oriented towards
address matching, and won't be particularly adept at author/title
matching.
Dan Feenberg
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/