hi,
I have a question and have tried countmatch and also the FAQ on identifying distinct observations across variables, but I need something a little bit different and can't figure out how to get it.
Here is the problem. I have a set of ids matched with a variety of names all referring to the same entity. Here is how the data looks:
id, name1, name2, name3
A1, AOL, AOL Time Warner, AOL
A2, Time, Time Inc, AOL
A3, Microsoft, MS Office, Micsoft
A4, AL, AOL, Bla
I need to somehow recognize that A1 (and all the names attached to it) and A2 (and all the names attached to it) and A4 (and all the names attached to it) refer to the same entity. Is there a way to form a new variable, say "same_entity", which will identify those observations where one/more of the names reappear. Here is what I would like to get:
id, name1, name2, name3, same_entity
A1, AOL, AOL Time Warner, A O L, 1
A2, Time, Time Inc, AOL, 1
A3, Microsoft, MS Office, Micsoft, 2
A4, AL, AOL, Bla, 1
Is there a way to do this or something like this? Any ideas/suggestions will be much appreciated.
best
dalhia
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/