[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[no subject]
*****
clear
cap ssc install reclink
**
input str44 var1 var2
"68th precinct youth council inc " 1
"action center for education and community development, inc " 2
"amistad child day care and family center inc" 3
end
replace var1 = lower(var1)
replace var1 = ltrim(var1)
replace var1 = rtrim(var1)
replace var1 = substr(var1, 1, 10)
sort var1
gen idusing = _n
save "datasetusing.dta", replace
clear
input str44 var1 var3
" 68th precinct youth council, inc." 3
"action center for education and community development-ps 106" 5
" amistad early childhood educational center inc" 5
end
replace var1 = lower(var1)
replace var1 = ltrim(var1)
replace var1 = rtrim(var1)
replace var1 = substr(var1, 1, 10)
sort var1
gen idmaster = _n
save "datasetmaster.dta", replace
//merge with reclink//
reclink var1 using "datasetusing.dta", idmaster(idmaster) idusing(idusing) gen(_match) minscore(.75)
li var1 Uvar1 _match
*******
Eric
__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
[email protected]
Office: +979.845.6754
On Dec 1, 2009, at 9:15 AM, Nick Cox wrote:
> There's no easy and failsafe solution here.
>
> -merge- doesn't know about meanings or approximate matches. It's entirely literal.
>
> I can think of two strategies.
>
> 1. You just need to work on one or indeed both datasets to produce variables that will merge. There's detailed advice within
>
> SJ-8-3 dm0039 . . . Stata tip 64: Cleaning up user-entered string variables
> . . . . . . . . . . . . . . . . . . . . . . . . J. Herrin and E. Poen
> Q3/08 SJ 8(3):444--445 (no commands)
> tip on how to clean up user-entered string variables
>
> 2. You could try soundex or similar tricks. Your example doesn't look encouraging for that strategy.
>
> Nick
> [email protected]
>
> Meryle Weinstein, Ph.D.
>
> I have two datasets that Im trying to merge by the following string
> variables: agencyname sitename siteaddress. There are slight differences
> in the datasets, particularly in the agencyname and sitename variables so
> I'm having trouble merging the two datasets. The problem seems to be that
> the agencyname differs slightly in each of the datasets. For example
>
> Dataset2 dataset2
> 68th precinct youth council inc 68th precinct
> youth council, inc.
> action center for education and community development, inc action center
> for education and community development-ps 106
> amistad child day care and family center inc amistad early
> childhood educational center inc
>
> Any suggestions on a way to merge by these three variables would be
> appreciated.
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |