Fred Wolfe
> 
> As the author of -finddup-, I have to agree with Nick that 
> -duplicates- has 
> more bells and whistles and seems to do everything that 
> -finddup- does.
> 
> Joe J., however, finds that -finddup- is useful  when one has 
> to decide 
> over which among the duplicates to include and which to 
> exclude. I agree. I 
> haven't used -duplicates- much and I may be mistaken about 
> its capabilities 
> in tagging duplicates. I believe that -duplicates- tags the 
> duplicated 
> observation with a number that represent the number of duplicates. 
Correct. 
> -finddup- tags the duplicates with a sequential number based 
> on a sorted 
> list such that if there are 3 duplicates they will be 
> numbered 1,2,3 (for 
> example).
> 
> I find that feature to be very useful. In situations where there are 
> duplicated keys but not duplicated observations, one may need 
> to decide 
> which of the duplicates to retain or to keep. Being able to 
> tag them with a 
> sequential number facilitates that task. For example, -drop if 
> inrange(dupval, 2,99)-
> 
> Here are some examples. We survey people with arthritis. 
> Inexplicably, some 
> persons complete 2 surveys (!) and are assigned duplicate 
> keys for the 
> major data set keys. The question arises, which observation should be 
> deleted (retained) as they are not true duplicates. One might 
> want to make 
> a rule to delete the first observation or the second, or 
> might want to look 
> at the data before making such a choice. For me, -finddup- is 
> little easier 
> to use in that circumstance. Nick will correct me if I have misread 
> -duplicates. Perhaps sequential numbering of the duplicates 
> could be added 
> to -duplicates-
That is an interesting idea for StataCorp to consider. 
Sequential tagging can, however, be done in at least two 
ways, starting at 0 and starting at 1. Fred likes 1; 
others might prefer 0. 
> -finddup- also does an un-Stata thing. it automatically 
> creates a variable 
> called -dupval-. -duplicates- forces you to name the new 
> variable. I like 
> -dupval- because i always remember its name, sort of like 
> -_merge- that 
> Stata creates automatically.
Indeed. Although there are Stata commands which use 
special names, that behaviour is not indulged in 
without a strong case. 
Nick 
[email protected] 
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/