| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: "Wrong" result with encode / merge ?
Assume you have these id7temp codes in your master data-set:
AT18679
AT18680
AT18681
Assume you have these id7temp codes in your using data-set:
AT18679
AT18681
AT18682
If you -encode- the id7temp variable in each data-set, you will get the
following id7 variables:
In the master data-set:
1=AT18679
2=AT18680
3=AT18681
In the using data-set:
1=AT18679
2=AT18681
3=AT18682
That is, encodes simply assigns numbers to each string.
If you -merge- the using data-set with the master data-set on id7, you
will get the following (wrong) match:
1 "AT18679" = 1 "AT18679"
2 "AT18680" = 2 "AT18681"
3 "AT18681" = 3 "AT18682"
That's not what you want. I don't think that you should -encode- your
values. Simply -merge- on id7temp, not id7.
HTH,
Philipp
Thomas Erdmann wrote:
Hi,
I have a dataset with ids that look like: AT18679U (two strings followed by
5 numbers, optionally followed by another string)
Between the two datasets I would like to merge only the first 7 digits are
equal, therefore I generated
generate id7temp=substr(id,1,7)
encode id7temp, gen(id7)
sort id7
and merged the two datasets by id7. When I quality checked the results there
were several mismatches, which don't seem to happen if I use the string id
and not the encoded one. Why is that?
Thanks in advance
-Tom
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/