I seem to recall that there's an algorithm that is able to crosswalk
databases by matching names combined with other secondary keys, such as
zip code, and that the algorithm will produce a "probability of match"
for the given ID. I used to conduct match merges based on name and zip
in an earlier version of Stata, but it was quite cumbersome to deal with
misspellings, typos (common transpositions of letters or numbers, etc.),
all caps vs lower case, prefixes and suffixes, titles, middle initial
versus middle name, etc, etc.. What I'd like to know is whether a more
sophisticated match/merge based on primary and secondary keys or IDs has
been developed, and if so some documentation on how it works. Also,
would it deal with very common names, such as "David Jones" vs less
common names, like "Horace Vilochkek" or size of the database, adjust
the probability of match accordingly. Or is all of this just some pipe
dream I happend to think up when I was under the influence?
I'll also try to scrounge up something on the FAQ database, but most of
my text documentation on Stata 9.2 is stored in boxes since I'm in the
midst of a move, and I need at least some idea of the capability of such
a match/merge within the week.
Scott Talkington, PhD
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/