Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: encode results in false match - merge/joinby
From
joe j <[email protected]>
To
[email protected]
Subject
st: encode results in false match - merge/joinby
Date
Thu, 10 Feb 2011 22:07:09 +0100
I just wanted to highlight something I encountered while merging two
data sets with encoded merge variables . The two tables in reality are
a perfect non-match. This is also the case when I use the matching
variable 'code' in the string format. But if I encode them and
generate a variable 'code1' and use that for merging there is a
perfect match. (Now, I don't remember why I encoded this
variable-there must have been a reason but that was definitely not
aimed at merge.)
Below is an example with two files being joined with string variable
'code' and encoded variable 'code1'--the latter results in a false
perfect match. I wonder if this strange behavior of encoded variables
is limited only to 'join' or could it be an issue also in other
contexts (?). Thanks for any pointers.
clear
input id str5 code
1 "123J5"
2 "68741"
3 "297J5"
4 "14856"
5 "AB234"
6 "25K45"
7 "12535"
end
encode code, gen(code1)
sort code1
save file1.dta, replace
clear
input id str5 code
1 "243J5"
2 "68348"
3 "479H5"
4 "467G5"
5 "23TUB"
6 "TU501"
7 "32LK8"
end
encode code, gen(code1)
joinby code1 using file1.dta, unmatched(both) /*perfect match*/
*joinby code using file1.dta, unmatched(both) /*perfect non-match*
ta _m
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/