Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: fuzzy merge problem
From
"Dimitriy V. Masterov" <[email protected]>
To
Statalist <[email protected]>
Subject
st: fuzzy merge problem
Date
Tue, 21 Sep 2010 17:52:57 -0400
I have a dataset of US counties and DMAs that looks like this:
ndma county
CHICO-REDDING BUTTE (C-SPLIT), CA
CHICO-REDDING BUTTE (REMAINDER), CA
CINCINNATI ADAMS, OH
CINCINNATI BOONE, KY
CINCINNATI BRACKEN, KY
I also have a dataset of counties that look like this:
county
BUTTE, CA
BUTTE, ID
BUTTE, SD
BUTTS, GA
CABARRUS, NC
The problem is that in the second dataset, BUTTE, CA county is not
split into two regions. There are many cases like this (too many to do
by hand) and I cannot merely delete the text in parentheses since it
is not always in parentheses, and the text varries. I can't use FIPS
code since it's not available in the first dataset. I need to merge
these datasets to use the dma information.
I tried merging on the first word in the county name and the state,
but that runs into problems with county names that begin with Spanish
articles. I tried M Blasnik's -reclink- (v 1.7 14-Jan-2010), but I
get:
. reclink county using ".\ihs_counties.dta", idmaster(county)
idusing(county) gen(match);
variable county not found
The variable county certainly exists in both datasets and it is a valid id.
I am using Stata 11.1 on a 64-bit Windows machine.
Any suggestions?
Dimitriy Masterov
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/