Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Simon <scmoore.lists@googlemail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: matching strings on words |
Date | Tue, 30 Mar 2010 20:56:51 +0100 |
I've had a similar issue: http://www.stata.com/statalist/archive/2008-08/msg00477.html Simon On 30/03/2010 20:00, Jeph Herrin wrote:
I'm not sure what to call this - if I did, I might have better luck with my searches for a utility. Basically, I want to do something similar to the utility -nmatch- which matches first and last names, but I have more than two words per record. The problem: I have two files with lists of hospital names. Hospital names tend to consist of multiple words, that get used to different extent; the same hospital might be listed as: st joseph's st joseph's memorial st joseph's memorial hospital st joseph's memorial hospital of danbury etc. (There is also a lot variation on eg "Saint vs "St." and "Memorial" vs "memorial", but I have trapped most of those already.) What I'd like to do is match these on "words", and generate a _merge variable which indicates how many words match vs how many words there are. Then I (or some unlucky grad student) can trawl through the matches and decide which ones are the same hospital. I can see how to write a program to do such a thing, but am hoping there is already a solution out there that I overlooked? thanks, Jeph * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/
* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/