I am as fond of -duplicates- as my twin, but it
is just a convenience command.
bysort random : assert _N == 1
is a much more direct way of testing that random
numbers are unique.
Nick
[email protected]
Hendri Adriaens
> Hi William,
>
> Thanks, that should work, although, as Nick Cox mentioned,
> there is a tiny
> probability that you generate the same number twice. So, one
> might need a
> check afterwards on duplicates and redo the process with a
> different seed if
> there are.
William Gould, StataCorp LP
> > Hendri Adriaens <[email protected]> has a dataset and writes,
> >
> > > I want to encrypt only a single variable, to anonimize data.
> >
> > Here is what I recommend.
> >
> > Let's call the data actual.dta and assume it has variable
> > uid, which is
> > the official user identification number that we want to encrypt.
> > uid can be a string or numeric, I don't care. uid might contain
> >
> > 136980408 recorded as a double or long, or
> > "136-98-408" recorded as a string, or even
> > "James Smith" recorded as a string.
> >
> > In what follows, we will allow the repeated repeated values
> > of uid in the
> > dataset. What we are going to do is come up with new id
> > numbers, use those,
> > and lock up the mapping of uid from newid.
> >
> > Here's step 1:
> >
> > . use actual, clear
> > . keep uid
> > . sort uid
> > . by uid: keep if _n==1
> >
> > . set seed _______ <- fill this in with a
> > random number
> > . gen double random = uniform()
> > . sort random
> > . gen long newid = _n
> >
> > . sort uid
> > . save mapping, replace
> >
> > New dataset mapping.dta contains two variables: uid and the
> > corresponding
> > newid. Next, we fix actual.dta for public consumption:
> >
> > . use actual
> > . sort uid
> > . merge uid using mapping
> > . assert _merge==3
> > . drop _merge uid
> > . save actual, replace
> >
> > Finally, we put mapping.dta in a save place. I would write
> > multiple copies
> > of actual.dta on multiple CDs and put the CDs in multiple
> > safes. Dataset
> > mapping contains all the secret information.
> >
> > Dataset actual.dta no longer contains uid; it contains newid.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/