Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: fuzzy merge problem
From
Scott Merryman <[email protected]>
To
[email protected]
Subject
Re: st: fuzzy merge problem
Date
Wed, 22 Sep 2010 09:58:01 -0500
On Tue, Sep 21, 2010 at 4:52 PM, Dimitriy V. Masterov
<[email protected]> wrote:
<snip>
> I tried merging on the first word in the county name and the state,
> but that runs into problems with county names that begin with Spanish
> articles.
Perhaps you could elaborate on this or give a more extensive example.
Would this method of extracting the county names using the county data
set work:
clear
input str20 county
"BUTTE, CA"
"BUTTE, ID"
"BUTTE, SD"
"BUTTS, GA"
"CABARRUS, NC"
"CONTRA COSTA, CA"
"SAN LUIS OBISPO, CA"
end
gen county2 = substr(county,1, length(county) -4)
levelsof county2,local(levels)
clear
input str13 ndma str29 county
"CHICO-REDDING" "BUTTE (C-SPLIT), CA"
"CHICO-REDDING" "BUTTE (REMAINDER), CA"
"CINCINNATI" "ADAMS, OH"
"CINCINNATI" "BOONE, KY"
"CINCINNATI" "BRACKEN, KY"
"Concord" "CONTRA COSTA, CA"
"Concord" "(C-SPLIT) CONTRA COSTA, CA"
"Pismo Beach" "SAN LUIS OBISPO, CA"
end
gen state = substr(county,-2,.)
gen str county2 = ""
foreach l of local levels {
replace county2 = "`l'" if regexm(county, "`l'")
}
l
Scott
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/