This is a big problem.
You might want to investigate using soundex to help with matching the misspelt names but, depending on the version of soundex that you use, it may not be particularly useful.
Michael Blasnik wrote an egen function to implement a soundex algorithm a while ago for Stata 7.
http://ideas.repec.org/c/boc/bocode/s420901.html
You could try that.
______________________________________________
Kieran McCaul MPH PhD
WA Centre for Health & Ageing (M573)
University of Western Australia
Level 6, Ainslie House
48 Murray St
Perth 6000
Phone: (08) 9224-2140
Phone: -61-8-9224-2140
email: [email protected]
http://myprofile.cos.com/mccaul
_______________________________________________
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Max Perez Leon
Sent: Friday, 8 August 2008 5:03 AM
To: [email protected]
Subject: st: Matching Names
Hello statalist users,
I am having a big problem trying to merge to datasets with names. The problem is
that there are tons of typos in both datasets. Examples bellow:
DATASET 1: --------------------- DATASET 2:
NAMES--------------------------- NAMES
LUIS P�REZ --------------------- LUIS P�REZ
WILLIAM SMITH ------------------ WILLIAM SMITHSS
JORGE F. CHOCAN ---------------- JORGE F CHOCANOS
P. BROWN ----------------------- PAUL BROWN
ENRIQUETA GAUDENCIA------------- ENRIQUETA G
I could do it by hand but I have 52568 obs and more to come. I am trying to
establish a method using regular expressions so that I can merge correctly the
datasets.
Any help will be very much appreciated,
Thanks for your time,
Max Perez Leon
PUCP-IEP
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/