Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: how to identify strings among which some are abbreviated and group strings which have the same keywords
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
st: RE: how to identify strings among which some are abbreviated and group strings which have the same keywords
Date
Wed, 9 Nov 2011 15:14:43 +0000
It is difficult to give really good advice here. But for both your questions, you could -encode- a different variable that was just the first word of the variable, which you can extract using the -word(,)- function.
See also
SJ-8-3 dm0039 . . . Stata tip 64: Cleaning up user-entered string variables
. . . . . . . . . . . . . . . . . . . . . . . . J. Herrin and E. Poen
Q3/08 SJ 8(3):444--445 (no commands)
tip on how to clean up user-entered string variables
Nick
[email protected]
Nina
The first one:
There is a string variable which defines applicant of patents in my dataset. I want to identify applicants uniquely, and I use -encode applicant, gen(firm)- to generate a numeric variable to identify them. However, for the same applicant, some of them are in full name and others are abbreviated. For example,
application number applicant
1 Mcneil consumer
2 Mcneil cons
when I use encode, two different identifiers are generated for the same applicant "mcneil consumer". Do you have any suggestions to deal with this case?
The second one:
The dataset is similar as the above one. And in this case, I want to generate a group id which assign one id for the applicants which is the subsidiaries of a company. For example, as shown in the following data, I want to generate a id which is equal to 1 for application 1&2 because the applicants are from "Mcneil"; while the id is equal to 2 for application 3&4 because they are from Mylan group.
application number applicant
1 MCNEIL PEDIATRICS
2 MCNEIL CONSUMER HEALTHCARE DIV MCNEIL PPC INC
3 MYLAN LABORATORIES INC
4 MYLAN PHARMACEUTICALS INC
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/