I guess everyone will agree that this kind of problem is a big deal and a big pain.
It's also a common one.
Last month Rufus Peabody started a similar thread: see the start at
<http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist.0807/Author/article-87.html>
Subsequently, Jeph Herrin and Eva Poen put together their contributions to this thread, with some further thoughts. Their combined advice will appear as a Stata Journal Tip in Stata Journal 8(3) 2008.
Nick
[email protected]
Max Perez Leon
I am having a big problem trying to merge to datasets with names. The problem is
that there are tons of typos in both datasets. Examples bellow:
DATASET 1: --------------------- DATASET 2:
NAMES--------------------------- NAMES
LUIS P�REZ --------------------- LUIS P�REZ
WILLIAM SMITH ------------------ WILLIAM SMITHSS
JORGE F. CHOCAN ---------------- JORGE F CHOCANOS
P. BROWN ----------------------- PAUL BROWN
ENRIQUETA GAUDENCIA------------- ENRIQUETA G
I could do it by hand but I have 52568 obs and more to come. I am trying to
establish a method using regular expressions so that I can merge correctly the
datasets.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/