Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: AW: rationalizing multiple ids for the same name


From   Dalhia <[email protected]>
To   [email protected]
Subject   Re: st: AW: rationalizing multiple ids for the same name
Date   Tue, 18 Aug 2009 11:55:06 -0700 (PDT)

Dear Prof. Martin, 

Thank you so  much. 

This works like a charm.

Best
dalhia

--- On Tue, 8/18/09, Martin Weiss <[email protected]> wrote:

> From: Martin Weiss <[email protected]>
> Subject: st: AW: rationalizing multiple ids for the same name
> To: [email protected]
> Date: Tuesday, August 18, 2009, 10:28 AM
> 
> <> 
> 
> Everything rides on what "the same name" means: Sometimes
> there is an "Inc"
> at the end, sometimes not. If you are willing to assume
> that some part of
> the string for "Name" needs to match, you can use the
> function -substr()- to
> extract part of it, but I would imagine that to be rather
> hazardous. 
> 
> Subsequently, you can use -egen, mode()- to get the most
> frequent ticker
> within the newly created "names".
> 
> Here is the second part:
> 
> 
> ***
> clear*
> 
> input str20(Name Ticker)
> "AOL Time Warner" "AOL"
> "AOL Time Warner" "TW"
> "AOL Time Warner" "TWX"
> "AOL Time Warner" "TWX"
> "AOL Time Warner" "T"
> "Microsoft" "MS" 
> end
> 
> compress
> 
> //trim the name to get rid of blanks
> replace Name=trim(Name)
> 
> bys Name: egen freqtick= /* 
>  */ mode(Ticker)
> list, noobs
> ***
> 
> 
> 
> HTH
> Martin
> 
> -----Ursprüngliche Nachricht-----
> Von: [email protected]
> [mailto:[email protected]]
> Im Auftrag von Dalhia
> Gesendet: Dienstag, 18. August 2009 06:17
> An: [email protected]
> Betreff: st: rationalizing multiple ids for the same name
> 
> Dear Statalist, I have a question and I am hoping for some
> help. 
> 
> I have a very large dataset of companies over time, and I
> have two different
> identifiers for these companies - name and ticker. The
> problem is that the
> two identifiers are not always consistent. For instance:
> 
> Name, Ticker
> 
> AOL Time Warner, AOL
> AOL Time Warner, TW
> AOL Time Warner, TWX
> AOL Time Warner Inc, TWX
> AOL Time Warner Inc, T
> Microsoft, MS
> 
> Basically the first 5 observations provide data about the
> same entity, AOL
> Time Warner, and I need a way of recognizing that these are
> all the same
> company. What I think will work is to check those names for
> which multiple
> tickers exist, and use the ticker which appears in the
> dataset the most, and
> put this most frequent ticker in a new variable New_Ticker.
> Here is how the
> data should now look: 
> 
> Name, Ticker, New_Ticker
> 
> AOL Time Warner, AOL, TWX
> AOL Time Warner, TW, TWX
> AOL Time Warner, TWX, TWX
> AOL Time Warner Inc, TWX, TWX
> AOL Time Warner Inc, T, TWX
> Microsoft, MS, MS
> 
> I am unable to figure out how to create this new variable
> New_ticker, which
> basically has the most frequently used ticker in cases
> where the same name
> has multiple tickers. I will be very grateful for any help
> on how to create
> a variable which does the above.
> 
> Best
> dalhia
> 
> 
>       
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



      

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index