Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Merging datasets using non-identical addresses/strings as identifiers
From
Benjamin Niug <[email protected]>
To
[email protected]
Subject
st: Merging datasets using non-identical addresses/strings as identifiers
Date
Sun, 5 Feb 2012 12:48:16 +0100
Hi folks,
I am having a specific merging question. I want to merge two datasets
that use addresses as the identifiers of the observations. However,
these addresses differ marginally - that is why I cannot use the
simple -merge- command. They might differ marginally regarding their
spelling (there are many systematic differences (e.g. bulevard instead
of boulevard) but also non-systematic ones e.g. simple spelling
mistakes) besides I want to merge addresses that can differ w.r.t. the
house number.
A stylized example (notice different spelling):
11 Sunset Boulevard, Tirana, Albania
to be merged with
13 Sunset Bulevard, Tirane
So far, I tried to tackle this problem using regular expressions -
but it does not work very well at all (as you typically only deal with
systematic differences). Does anybody have a suggestion for a
procedure that I could use for this problem?
Thanks in advance!
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/