Dear Prof. Martin,
Thank you so much.
This works like a charm.
Best
dalhia
--- On Tue, 8/18/09, Martin Weiss <[email protected]> wrote:
> From: Martin Weiss <[email protected]>
> Subject: st: AW: rationalizing multiple ids for the same name
> To: [email protected]
> Date: Tuesday, August 18, 2009, 10:28 AM
>
> <>
>
> Everything rides on what "the same name" means: Sometimes
> there is an "Inc"
> at the end, sometimes not. If you are willing to assume
> that some part of
> the string for "Name" needs to match, you can use the
> function -substr()- to
> extract part of it, but I would imagine that to be rather
> hazardous.
>
> Subsequently, you can use -egen, mode()- to get the most
> frequent ticker
> within the newly created "names".
>
> Here is the second part:
>
>
> ***
> clear*
>
> input str20(Name Ticker)
> "AOL Time Warner" "AOL"
> "AOL Time Warner" "TW"
> "AOL Time Warner" "TWX"
> "AOL Time Warner" "TWX"
> "AOL Time Warner" "T"
> "Microsoft" "MS"
> end
>
> compress
>
> //trim the name to get rid of blanks
> replace Name=trim(Name)
>
> bys Name: egen freqtick= /*
> */ mode(Ticker)
> list, noobs
> ***
>
>
>
> HTH
> Martin
>
> -----Ursprüngliche Nachricht-----
> Von: [email protected]
> [mailto:[email protected]]
> Im Auftrag von Dalhia
> Gesendet: Dienstag, 18. August 2009 06:17
> An: [email protected]
> Betreff: st: rationalizing multiple ids for the same name
>
> Dear Statalist, I have a question and I am hoping for some
> help.
>
> I have a very large dataset of companies over time, and I
> have two different
> identifiers for these companies - name and ticker. The
> problem is that the
> two identifiers are not always consistent. For instance:
>
> Name, Ticker
>
> AOL Time Warner, AOL
> AOL Time Warner, TW
> AOL Time Warner, TWX
> AOL Time Warner Inc, TWX
> AOL Time Warner Inc, T
> Microsoft, MS
>
> Basically the first 5 observations provide data about the
> same entity, AOL
> Time Warner, and I need a way of recognizing that these are
> all the same
> company. What I think will work is to check those names for
> which multiple
> tickers exist, and use the ticker which appears in the
> dataset the most, and
> put this most frequent ticker in a new variable New_Ticker.
> Here is how the
> data should now look:
>
> Name, Ticker, New_Ticker
>
> AOL Time Warner, AOL, TWX
> AOL Time Warner, TW, TWX
> AOL Time Warner, TWX, TWX
> AOL Time Warner Inc, TWX, TWX
> AOL Time Warner Inc, T, TWX
> Microsoft, MS, MS
>
> I am unable to figure out how to create this new variable
> New_ticker, which
> basically has the most frequently used ticker in cases
> where the same name
> has multiple tickers. I will be very grateful for any help
> on how to create
> a variable which does the above.
>
> Best
> dalhia
>
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/