If observations are duplicates, the choice of
which to keep can be difficult...
-duplicates- arrived with Stata 8. Some
users were already in the habit of using
various user-written programs published
in the STB or on SSC, including -unique-,
-finddup-, -dups- and various others.
If they serve your purpose, fine.
But you no doubt are aware that observations
can be duplicates with respect to some
variables -- in your case -id- and -year- --
but differ with respect to other variables.
-finddup- offers no facilities for dropping
duplicates. It is an inspection program,
and gives information which can be used
to decide on what to -drop-.
The intent of -duplicates- is to provide
a more general tool, including functionality
for -drop-ping duplicates. But -duplicates-
will not let you go
. duplicates drop id year
whenever other variables also exist. You
must spell out
. duplicates drop id year, force
as a reminder that you may be losing information.
In this way -duplicates- is designed to be
potentially destructive, but also to inhibit
accidental loss of real information.
Nick
[email protected]
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]On Behalf Of joe J.
> Sent: 21 April 2004 11:08
> To: [email protected]
> Subject: RE: st: RE: -finddup- for panel?
>
>
> Stata's official -duplicates- command also helps to identify
> duplicate
> observations. But I have a feeling that -finddup- is useful
> when one has to
> decide over which among the duplicates to include and which
> to exclde (for
> late use, say) while generating a dupliate-free data set.
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/