Fred Wolfe
>
> As the author of -finddup-, I have to agree with Nick that
> -duplicates- has
> more bells and whistles and seems to do everything that
> -finddup- does.
>
> Joe J., however, finds that -finddup- is useful when one has
> to decide
> over which among the duplicates to include and which to
> exclude. I agree. I
> haven't used -duplicates- much and I may be mistaken about
> its capabilities
> in tagging duplicates. I believe that -duplicates- tags the
> duplicated
> observation with a number that represent the number of duplicates.
Correct.
> -finddup- tags the duplicates with a sequential number based
> on a sorted
> list such that if there are 3 duplicates they will be
> numbered 1,2,3 (for
> example).
>
> I find that feature to be very useful. In situations where there are
> duplicated keys but not duplicated observations, one may need
> to decide
> which of the duplicates to retain or to keep. Being able to
> tag them with a
> sequential number facilitates that task. For example, -drop if
> inrange(dupval, 2,99)-
>
> Here are some examples. We survey people with arthritis.
> Inexplicably, some
> persons complete 2 surveys (!) and are assigned duplicate
> keys for the
> major data set keys. The question arises, which observation should be
> deleted (retained) as they are not true duplicates. One might
> want to make
> a rule to delete the first observation or the second, or
> might want to look
> at the data before making such a choice. For me, -finddup- is
> little easier
> to use in that circumstance. Nick will correct me if I have misread
> -duplicates. Perhaps sequential numbering of the duplicates
> could be added
> to -duplicates-
That is an interesting idea for StataCorp to consider.
Sequential tagging can, however, be done in at least two
ways, starting at 0 and starting at 1. Fred likes 1;
others might prefer 0.
> -finddup- also does an un-Stata thing. it automatically
> creates a variable
> called -dupval-. -duplicates- forces you to name the new
> variable. I like
> -dupval- because i always remember its name, sort of like
> -_merge- that
> Stata creates automatically.
Indeed. Although there are Stata commands which use
special names, that behaviour is not indulged in
without a strong case.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/