Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Identify duplicate observations by a varlist, then drop them based on other variables
From
Aaron Kirkman <[email protected]>
To
[email protected]
Subject
st: Identify duplicate observations by a varlist, then drop them based on other variables
Date
Sun, 30 Sep 2012 20:15:17 -0500
Dear Statalist,
I have a dataset with about 20 million observations and I'd like to
remove duplicate observations from it. However, the observations are
only duplicated in the --date--, --symbol--, and --adjclose--
variables, not the -exchange- variable, as shown.
date exchange symbol adjclose
8496 NASDAQ ADP 1.39
8497 NASDAQ ADP 1.42
8498 NASDAQ ADP 1.41
8501 NASDAQ ADP 1.39
8502 NASDAQ ADP 1.4
8503 NASDAQ ADP 1.41
8504 NASDAQ ADP 1.45
8505 NASDAQ ADP 1.44
8508 NASDAQ ADP 1.43
8509 NASDAQ ADP 1.4
8496 NYSE ADP 1.39
8497 NYSE ADP 1.42
8498 NYSE ADP 1.41
8501 NYSE ADP 1.39
8502 NYSE ADP 1.4
8503 NYSE ADP 1.41
8504 NYSE ADP 1.45
8505 NYSE ADP 1.44
8508 NYSE ADP 1.43
8509 NYSE ADP 1.4
I can identify observations that are duplicated in the --date--,
--symbol--, and --adjclose-- variables using -- duplicates list date
symbol adjclose--, but I'm unsure how to drop the observations from
one specific exchange programmatically.
It doesn't matter which exchange is dropped, as long as all the
observations from that exchange are dropped if the stock appears on
multiple exchanges. Is --duplicates-- the wrong way to go about doing
this? If no simple solution exists, I could always generate a new
variable based on --exchange-- and --symbol-- and use that as a panel
variable.
Thank you,
Aaron
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/