Ok, thank you Nick,
-Hendri.
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Nick Cox
> Sent: woensdag 13 juni 2007 18:32
> To: [email protected]
> Subject: RE: st: RE: RE: Encryption of data
>
> I am as fond of -duplicates- as my twin, but it
> is just a convenience command.
>
> bysort random : assert _N == 1
>
> is a much more direct way of testing that random
> numbers are unique.
>
> Nick
> [email protected]
>
> Hendri Adriaens
>
> > Hi William,
> >
> > Thanks, that should work, although, as Nick Cox mentioned,
> > there is a tiny
> > probability that you generate the same number twice. So, one
> > might need a
> > check afterwards on duplicates and redo the process with a
> > different seed if
> > there are.
>
> William Gould, StataCorp LP
>
> > > Hendri Adriaens <[email protected]> has a dataset and writes,
> > >
> > > > I want to encrypt only a single variable, to anonimize data.
> > >
> > > Here is what I recommend.
> > >
> > > Let's call the data actual.dta and assume it has variable
> > > uid, which is
> > > the official user identification number that we want to encrypt.
> > > uid can be a string or numeric, I don't care. uid might contain
> > >
> > > 136980408 recorded as a double or long, or
> > > "136-98-408" recorded as a string, or even
> > > "James Smith" recorded as a string.
> > >
> > > In what follows, we will allow the repeated repeated values
> > > of uid in the
> > > dataset. What we are going to do is come up with new id
> > > numbers, use those,
> > > and lock up the mapping of uid from newid.
> > >
> > > Here's step 1:
> > >
> > > . use actual, clear
> > > . keep uid
> > > . sort uid
> > > . by uid: keep if _n==1
> > >
> > > . set seed _______ <- fill this in with a
> > > random number
> > > . gen double random = uniform()
> > > . sort random
> > > . gen long newid = _n
> > >
> > > . sort uid
> > > . save mapping, replace
> > >
> > > New dataset mapping.dta contains two variables: uid and the
> > > corresponding
> > > newid. Next, we fix actual.dta for public consumption:
> > >
> > > . use actual
> > > . sort uid
> > > . merge uid using mapping
> > > . assert _merge==3
> > > . drop _merge uid
> > > . save actual, replace
> > >
> > > Finally, we put mapping.dta in a save place. I would write
> > > multiple copies
> > > of actual.dta on multiple CDs and put the CDs in multiple
> > > safes. Dataset
> > > mapping contains all the secret information.
> > >
> > > Dataset actual.dta no longer contains uid; it contains newid.
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/