Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Identify duplicate observations by a varlist, then drop them based on other variables
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Identify duplicate observations by a varlist, then drop them based on other variables
Date
Mon, 1 Oct 2012 02:34:06 +0100
I an not clear what advice you seek. If you don't care about
-exchange-, you have duplicates you can drop, but not otherwise. If
you -duplicates drop- them, -duplicates- will be indifferent to which
-exchange- they are.
You can also do this:
duplicates tag date symbol ad , gen(tag)
drop if tag & exchange == "NASDAQ"
if you have a reason to drop one exchange and not another.
Nick (original author of -duplicates-)
On Mon, Oct 1, 2012 at 2:15 AM, Aaron Kirkman <[email protected]> wrote:
> I have a dataset with about 20 million observations and I'd like to
> remove duplicate observations from it. However, the observations are
> only duplicated in the --date--, --symbol--, and --adjclose--
> variables, not the -exchange- variable, as shown.
>
> date exchange symbol adjclose
> 8496 NASDAQ ADP 1.39
> 8497 NASDAQ ADP 1.42
> 8498 NASDAQ ADP 1.41
> 8501 NASDAQ ADP 1.39
> 8502 NASDAQ ADP 1.4
> 8503 NASDAQ ADP 1.41
> 8504 NASDAQ ADP 1.45
> 8505 NASDAQ ADP 1.44
> 8508 NASDAQ ADP 1.43
> 8509 NASDAQ ADP 1.4
> 8496 NYSE ADP 1.39
> 8497 NYSE ADP 1.42
> 8498 NYSE ADP 1.41
> 8501 NYSE ADP 1.39
> 8502 NYSE ADP 1.4
> 8503 NYSE ADP 1.41
> 8504 NYSE ADP 1.45
> 8505 NYSE ADP 1.44
> 8508 NYSE ADP 1.43
> 8509 NYSE ADP 1.4
>
> I can identify observations that are duplicated in the --date--,
> --symbol--, and --adjclose-- variables using -- duplicates list date
> symbol adjclose--, but I'm unsure how to drop the observations from
> one specific exchange programmatically.
>
> It doesn't matter which exchange is dropped, as long as all the
> observations from that exchange are dropped if the stock appears on
> multiple exchanges. Is --duplicates-- the wrong way to go about doing
> this? If no simple solution exists, I could always generate a new
> variable based on --exchange-- and --symbol-- and use that as a panel
> variable.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/