Michael's specific questions and various helpful answers from Martin and
others continue, but the general question here merits further comment.
-duplicates- looks for observations that are duplicates on a varlist. If
you don't name a varlist, the varlist is all variables. If you do name a
varlist, it is naturally that.
Duplicates means that all the variables concerned have identical values
for two or more observations.
So given
duplicates <whatever> headroom trunk
there is absolutely no question about it. -duplicates- does _not_ look
for duplicates on either -headroom- or -trunk-. It only looks for
duplicates on _both_ variables.
If you want the OR interpretation, you have to run -duplicates-
separately and combine the results.
Here is a sketch.
gen byte isanydup = 0
foreach v of var <varlist> {
duplicates tag `v', gen(work)
replace isanydup = isanydup | work
drop work
}
Then look at -isanydup-.
Nick
[email protected]
Michael McCulloch
Thanks Martin. Am I correct in understanding that, in this revised
example immediately below, the command:
. duplicates tag headroom trunk, generate(dup)
would tag as dup>0 all sets of observations for which there are
duplicates of:
headroom *AND* trunk
and not just those for which there are duplicates of:
headroom *OR* trunk
?
It looks that way on visual inspection of this example's output, but
I want to make sure before applying it to my much larger dataset.
clear
sysuse auto
list foreign headroom trunk
duplicates tag headroom trunk, generate(dup)
sort headroom trunk
list foreign headroom trunk dup if dup>0, clean
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/