st: String variable behaving oddly

From   Anna Reimondos <[email protected]>
To   [email protected]
Subject   st: String variable behaving oddly
Date   Thu, 11 Oct 2012 21:33:37 +1100

Dear Statalist

I am currently cleaning a survey dataset with a variety of numeric as
well as string variables. I recently discovered some very odd
behaviour with one of the string variables that I have to deal with
before I can finish my work.

In the example beow there are 23 responses from people who answered a
question about who they believe is the most influential sports person
in Australia. All these 23 people answered the same thing 'Evonne
Goolagong Cawley' (some famous sports lady).

The problem is that when I do a simple tab of the variable there are
two entries for Evonne Goolagong Cawley instead of just one. I don't
understand what is happening.

. tab var1

   [F4a] Most influential sportspeople: |
                              1st choice |      Freq.     Percent        Cum.
                 Evonne Goolagong Cawley |          2        8.70        8.70
                 Evonne Goolagong Cawley |         21       91.30      100.00
                                   Total |         23      100.00

Twp respondents are somehow being identified as having a different
answer to the rest of the people even though the spelling is exactly
the same. I have tried trimming the data, triple checking the spelling
 and so on, but can't get to the bottom of this and it is driving me
up  the wall.

Just for reference this 'issue' is affecting other entries as well,
where what I think looks like exactly the same response is not
recognised as such.

 Any help would be much appreciated.

I have a copy of the dataset (just an extract) if anyone is interested.
