Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Merging datasets using non-identical addresses/strings as identifiers
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Merging datasets using non-identical addresses/strings as identifiers
Date
Sun, 5 Feb 2012 12:09:02 +0000
No easy answer. Some ideas at
SJ-8-3 dm0039 . . . Stata tip 64: Cleaning up user-entered string variables
. . . . . . . . . . . . . . . . . . . . . . . . J. Herrin and E. Poen
Q3/08 SJ 8(3):444--445 (no commands)
tip on how to clean up user-entered string variables
If 11 can be 13, and vice versa, almost anything goes!
Nick
On Sun, Feb 5, 2012 at 11:48 AM, Benjamin Niug
<[email protected]> wrote:
> I am having a specific merging question. I want to merge two datasets
> that use addresses as the identifiers of the observations. However,
> these addresses differ marginally - that is why I cannot use the
> simple -merge- command. They might differ marginally regarding their
> spelling (there are many systematic differences (e.g. bulevard instead
> of boulevard) but also non-systematic ones e.g. simple spelling
> mistakes) besides I want to merge addresses that can differ w.r.t. the
> house number.
>
> A stylized example (notice different spelling):
> 11 Sunset Boulevard, Tirana, Albania
>
> to be merged with
>
> 13 Sunset Bulevard, Tirane
>
> So far, I tried to tackle this problem using regular expressions -
> but it does not work very well at all (as you typically only deal with
> systematic differences). Does anybody have a suggestion for a
> procedure that I could use for this problem?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/