Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: how to identify strings among which some are abbreviated and group strings which have the same keywords
From
Nina <[email protected]>
To
statalist <[email protected]>
Subject
st: how to identify strings among which some are abbreviated and group strings which have the same keywords
Date
Wed, 9 Nov 2011 16:02:26 +0100
Dear all,
I have two questions to ask for your help.
The first one:
There is a string variable which defines applicant of patents in my dataset. I want to identify applicants uniquely, and I use -encode applicant, gen(firm)- to generate a numeric variable to identify them. However, for the same applicant, some of them are in full name and others are abbreviated. For example,
application number applicant
1 Mcneil consumer
2 Mcneil cons
when I use encode, two different identifiers are generated for the same applicant "mcneil consumer". Do you have any suggestions to deal with this case?
The second one:
The dataset is similar as the above one. And in this case, I want to generate a group id which assign one id for the applicants which is the subsidiaries of a company. For example, as shown in the following data, I want to generate a id which is equal to 1 for application 1&2 because the applicants are from "Mcneil"; while the id is equal to 2 for application 3&4 because they are from Mylan group.
application number applicant
1 MCNEIL PEDIATRICS
2 MCNEIL CONSUMER HEALTHCARE DIV MCNEIL PPC INC
3 MYLAN LABORATORIES INC
4 MYLAN PHARMACEUTICALS INC
Any suggestions and comments are more than welcome!
Thank you very much!
Best,
Nina
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/