Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Extract a letter between numbers
From
Patrick McNamara <[email protected]>
To
[email protected]
Subject
Re: st: Extract a letter between numbers
Date
Mon, 22 Nov 2010 16:21:31 -0500
Those both sound like good ideas. Any advice on how to execute them
after install? :)
To give an idea of what I'm working with, I've listed a correct
address and some examples of address problems below:
5654 N Oak St Chicago, Illinois
56e54 Oak st Chicago, Illinois
5654 North Oak Chicago Illinois
5654 No. Oak St
5654 Oak St
There may be more than one of these issues present in a single address
entry. What I'm trying to do right now is find the length of the first
three words after the home address (5654), then use the longest and
2nd longest to see which has a better matching rate. But nearmrg or
strgroup may work much better.
Patrick
On Mon, Nov 22, 2010 at 3:41 PM, Dimitriy V. Masterov
<[email protected]> wrote:
> I think you may want to fuzzy merge your dirty address data and your
> clean data using nearmrg, which you can get from scc.
>
> An alternative way would to append your two data sets and then use
> strgroup on the variable that is the stacked version of your clean and
> dirty addresses. That will give you the closest match.
>
> Neither one will be perfect and may take a long time/fail if you have
> too much data. The latter approach has some operating system
> restrictions as well.
>
> DVM
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/