As the original author of -duplicates- (which in turn owes
much to earlier joint work with Thomas Steichen) I have
to say that its behaviour is exactly right here. Indeed
I would say the same if I had never touched the code.
-duplicates-' idea of a duplicate is that observations
are identical (on the variables specified). How could it
be otherwise? Thus -duplicates- is indeed irrelevant to your problem.
Your problem is different but is soluble in Stata terms
if you can give exact rules for what kind of tolerance you allow
_within groups of observations_. As with any kind of clustering
problem, specifying a distance or difference tolerance is only
part of the problem, as joining or merging rules need to be
specified too.
Nick
[email protected]
[email protected]
> I am working with a very large panel dataset, and would like to tag
> observations that repeat annually (compared to the odd, or
> the unscheduled
> observation). My rule for tagging observations is something like: if
> another observation falls exactly one year before or after the current
> observation (-/+ 3 days, to deal with minor deviations - due
> to, say, dates
> that fall on weekends), tag both observations. I explored the use of
> "duplicates" and splitting the dates to year, month, and day to little
> effect (it can be used only for exact matches rather ranges,
> and will tag
> similar observations in terms of day and month in
> non-consecutive years).
> Any help would be greatly appreciated.
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/