Ada Ma
> I want to know how to drop some observations. For
> example I have the
> following data, the numbers are identifiers:
>
> Original Fake
> 1 23
> 14 235
> 35 90
> 45 87
> 87 45
>
> Now the 4th and 5th pairs are the exact opposite of one
> another. I want to
> drop the 5th observation, but I don't know I can use the
> data to flag up
> such observations so that I can drop it. Could someone
> show me the light?
In the SSC archives, you can find -fndmtch2-
which may be relevant. However, a more direct
approach appears possible, as these are numbers,
and indeed I guess the other values you are dealing
in practice are all integers, as in your example.
. gen min = min(Original, Fake)
. gen max = max(Original, Fake)
. egen both = concat(min max), p(" ")
That is (45 and 87) and (87 and 45)
both map to "45 87" and the problem is now
one of dropping one of each pair of
duplicates.
In Stata 8, -duplicates- provides one
way of doing this. In Stata before 8,
do -findit duplicates- or -search duplicates-
or just (in 7)
. bysort both : keep if _n == 1
(before 7)
. sort both
. qui by both : keep if _n == 1
Note that this does not assume, as do some
suggested solutions in this thread, that the reversed
pairs occur as adjacent observations.
This will also catch cases in which the
original and fake ids are the same. You may
or may not want to -drop- those.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/