Continuing the discussion on anonomization of ids, I wrote,
> There is no additional security to be gained by doing that.
> Ties do not
> matter in this case.
and Hendri Adriaens <[email protected]> just replied,
> It might not matter for security, but for my application it does. The
> information from the master data set (that will be anonymised) will have to
> be merged into a new dataset (to be anonymised with the mapping). If the
> mapping contains ties, -merge- wouldn't know which of the tied records to
> insert in the new dataset.
No.
There is no problem. The mapping will not contain ties even if ties
arise due to the random numbers drawn. Every person will in the data
will have a unique id. Nick Cox's point was that, if we reran the
program that generated the mapping, it is possible that it would create
a different mapping, and my point is response was, that doesn't matter.
In either case, the mapping will be unique. All that might happen is
is that
uid newid
123-45-6789 100
999-99-9999 101
the first time we ran, and
uid newid
999-99-9999 100
123-45-6789 101
the second. It doesn't matter what the mapping is, however, because once
we set it, we will continue to use that mapping, and one is as good as
the other, cyptographically speaking.
-- Bill
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/