Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Merging datasets using non-identical addresses/strings as identifiers


From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: Merging datasets using non-identical addresses/strings as identifiers
Date   Sun, 5 Feb 2012 12:09:02 +0000

No easy answer. Some ideas at

SJ-8-3  dm0039  . . .  Stata tip 64: Cleaning up user-entered string variables
        . . . . . . . . . . . . . . . . . . . . . . . .  J. Herrin and E. Poen
        Q3/08   SJ 8(3):444--445                                 (no commands)
        tip on how to clean up user-entered string variables

If 11 can be 13, and vice versa, almost anything goes!

Nick

On Sun, Feb 5, 2012 at 11:48 AM, Benjamin Niug
<[email protected]> wrote:

> I am having a specific merging question. I want to merge two datasets
> that use addresses as the identifiers of the observations. However,
> these addresses differ marginally - that is why I cannot use the
> simple -merge- command. They might differ marginally regarding their
> spelling (there are many systematic differences (e.g. bulevard instead
> of boulevard) but also non-systematic ones e.g. simple spelling
> mistakes) besides I want to merge addresses that can differ w.r.t. the
> house number.
>
> A stylized example (notice different spelling):
> 11 Sunset Boulevard, Tirana, Albania
>
> to be merged with
>
> 13 Sunset Bulevard, Tirane
>
> So far, I tried to tackle this problem using regular expressions  -
> but it does not work very well at all (as you typically only deal with
> systematic differences). Does anybody have a suggestion for a
> procedure that I could use for this problem?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index