On Wed, Aug 13, 2008 at 8:17 AM, Simon Moore <[email protected]> wrote:
Dear Statalist,
I have a string variable that contains values something like this:-
"outside the red lion pub"
"red lion"
"in the red lyon"
and so on.
I need to search this variable for names (e.g. "red lion") and would like to
do so in such a way that overcome the inevitable typo (e.g. "red lyon").
How about using Michael Blasnik's implementation of Donald Knuth's
SOUNDEX algorithm:
clear
input str25 var1
"outside the red lion pub"
"red lion"
"red lion"
"redlion"
"in the red lyon"
"blue lion"
"red troll"
end
egen foo = soundex(var1) ,length(12)
gen tag = regexm(foo, "3.*5")
l
Scott
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/