Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Identify duplicate observations by a varlist, then drop them based on other variables
From
Aaron Kirkman <[email protected]>
To
[email protected]
Subject
Re: st: Identify duplicate observations by a varlist, then drop them based on other variables
Date
Mon, 1 Oct 2012 13:29:33 -0500
Hi Nick,
That looks like the best solution, so I'll use --duplicates tag--
Thank you,
Aaron
On Sun, Sep 30, 2012 at 8:34 PM, Nick Cox <[email protected]> wrote:
> I an not clear what advice you seek. If you don't care about
> -exchange-, you have duplicates you can drop, but not otherwise. If
> you -duplicates drop- them, -duplicates- will be indifferent to which
> -exchange- they are.
>
> You can also do this:
>
> duplicates tag date symbol ad , gen(tag)
> drop if tag & exchange == "NASDAQ"
>
> if you have a reason to drop one exchange and not another.
>
> Nick (original author of -duplicates-)
>
> On Mon, Oct 1, 2012 at 2:15 AM, Aaron Kirkman <[email protected]> wrote:
>
>> I have a dataset with about 20 million observations and I'd like to
>> remove duplicate observations from it. However, the observations are
>> only duplicated in the --date--, --symbol--, and --adjclose--
>> variables, not the -exchange- variable, as shown.
>>
>> date exchange symbol adjclose
>> 8496 NASDAQ ADP 1.39
>> 8497 NASDAQ ADP 1.42
>> 8498 NASDAQ ADP 1.41
>> 8501 NASDAQ ADP 1.39
>> 8502 NASDAQ ADP 1.4
>> 8503 NASDAQ ADP 1.41
>> 8504 NASDAQ ADP 1.45
>> 8505 NASDAQ ADP 1.44
>> 8508 NASDAQ ADP 1.43
>> 8509 NASDAQ ADP 1.4
>> 8496 NYSE ADP 1.39
>> 8497 NYSE ADP 1.42
>> 8498 NYSE ADP 1.41
>> 8501 NYSE ADP 1.39
>> 8502 NYSE ADP 1.4
>> 8503 NYSE ADP 1.41
>> 8504 NYSE ADP 1.45
>> 8505 NYSE ADP 1.44
>> 8508 NYSE ADP 1.43
>> 8509 NYSE ADP 1.4
>>
>> I can identify observations that are duplicated in the --date--,
>> --symbol--, and --adjclose-- variables using -- duplicates list date
>> symbol adjclose--, but I'm unsure how to drop the observations from
>> one specific exchange programmatically.
>>
>> It doesn't matter which exchange is dropped, as long as all the
>> observations from that exchange are dropped if the stock appears on
>> multiple exchanges. Is --duplicates-- the wrong way to go about doing
>> this? If no simple solution exists, I could always generate a new
>> variable based on --exchange-- and --symbol-- and use that as a panel
>> variable.
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/