Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: matching strings on words

From	Simon <[email protected]>
To	[email protected]
Subject	Re: st: matching strings on words
Date	Tue, 30 Mar 2010 20:56:51 +0100

I've had a similar issue:

http://www.stata.com/statalist/archive/2008-08/msg00477.html

Simon




On 30/03/2010 20:00, Jeph Herrin wrote:


I'm not sure what to call this - if I did, I might have
better luck with my searches for a utility. Basically,
I want to do something similar to the utility -nmatch-
which matches first and last names, but I have more than
two words per record.

The problem: I have two files with lists of hospital names.
Hospital names tend to consist of multiple words, that get
used to different extent; the same hospital might be listed
as:

st joseph's
st joseph's memorial
st joseph's memorial hospital
st joseph's memorial hospital of danbury

etc. (There is also a lot variation on eg "Saint vs "St." and
"Memorial" vs "memorial", but I have trapped most of those
already.)

What I'd like to do is match these on "words", and generate
a _merge variable which indicates how many words match vs
how many words there are. Then I (or some unlucky grad student)
can trawl through the matches and decide which ones are the
same hospital.

I can see how to write a program to do such a thing, but am hoping
there is already a solution out there that I overlooked?

thanks,
Jeph

*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: matching strings on words
  - From: Jeph Herrin <[email protected]>

References:
- st: matching strings on words
  - From: Jeph Herrin <[email protected]>

Prev by Date: st: Date: Tue, 30 Mar 2010 20:24:12 +0100
Next by Date: st: RE: how to get the marginal effects after probit in an excel sheet
Previous by thread: Re: st: matching strings on words
Next by thread: Re: st: matching strings on words
Index(es):
- Date
- Thread