[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: matching misspelled names

From	Clyde Schechter <[email protected]>
To	[email protected]
Subject	st: matching misspelled names
Date	Fri, 23 Aug 2002 16:07:09 -0400

I have a  dataset, one of whose variables contains names of drugs.  Many of
the entries are misspelled or truncated.  I have an index file with a
reasonably complete list of commercial and generic drug names.  After
merging the files and identifying exact matches, I would like to try to
match the remaining, presumably misspelled, drug names with a corresponding
correct name from the index.  When the names are of people, the soundex
algorithm usually provides a reasonably short list of candidate matches.
But trying it with these drug names, many of the misspellings match with
several dozen candidates, making the resulting list of names and candidate
matches for manual review and selection unworkably long.  

Does anybody out there know of an alternative to soundex coding that might
work better in this peculiar vocabulary?  Or of another approach to this
problem?

Thanks in advance for any help.

Clyde Schechter
Dept. of Family Medicine & Community Health
Albert Einstein College of Medicine
Bronx, NY, USA

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- st: Re: matching misspelled names
  - From: "Michael Blasnik" <[email protected]>

Prev by Date: Re: st: Is there a way of testing for endogeneity in probit models (to rule out simultaneous equations)?
Next by Date: Re: st: RE: Generating random variates
Previous by thread: st: Is there a way of testing for endogeneity in probit models (to rule out simultaneous equations)?
Next by thread: st: Re: matching misspelled names
Index(es):
- Date
- Thread