Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Duplicate observations
From
emanuele mazzini <[email protected]>
To
[email protected]
Subject
st: Duplicate observations
Date
Mon, 10 Mar 2014 19:30:52 +0100
Hello to everybody,
I have an issue about duplicate observations that I find puzzling to solve.
I have data on country-pairs by year and I am interested in two
specific variables, a date and, say a variable which I call x_1.
Specifically, my data look like this :
reporter partner year date x_1
Albania Austria 1980 6dec1980 n_1
Albania Austria 1980 15nov1980 n_1
. . .
. . .
. . .
As you may have noticed observations differ amongst them only by date
and I need to drop them so as to keep the most recent one (hence, in
this case the second one).
I ran the following commands:
duplicates tag reporter partner year, generate(dup)
by reporter partner year (x_1 -date), sort: gen duplicates=_n
so as to be able to identify duplicates and then - among those with
dup >0 - drop those for which duplicates > 1.
This method was suggested in this thread (I take this opportunity to
thank again), but it seems not to work for some observations.
Take, for instance the following example:
reporter partner year date x_1 dup duplicates
Albania Germany 1967 08apr1967 n_1 1 1
Albania Germany 1967 17dec1967 n_1 1 2
As you may notice, Stata identifies the observation occurred the
17dec1967 as those with duplicates > 1 (which will then be dropped),
while I would have expected Stata to make the opposite.
Can anyone explain me why and, possibly, tell me how to deal with such issue?
Thank you very much in advance,
Emanuele
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/