Hi William,
Thanks, that should work, although, as Nick Cox mentioned, there is a tiny
probability that you generate the same number twice. So, one might need a
check afterwards on duplicates and redo the process with a different seed if
there are.
But thanks for your help, best regards,
-Hendri.
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of
> William Gould, StataCorp LP
> Sent: woensdag 13 juni 2007 17:54
> To: [email protected]
> Subject: Re: st: RE: RE: Encryption of data
>
> Hendri Adriaens <[email protected]> has a dataset and writes,
>
> > I want to encrypt only a single variable, to anonimize data.
>
> Here is what I recommend.
>
> Let's call the data actual.dta and assume it has variable
> uid, which is
> the official user identification number that we want to encrypt.
> uid can be a string or numeric, I don't care. uid might contain
>
> 136980408 recorded as a double or long, or
> "136-98-408" recorded as a string, or even
> "James Smith" recorded as a string.
>
> In what follows, we will allow the repeated repeated values
> of uid in the
> dataset. What we are going to do is come up with new id
> numbers, use those,
> and lock up the mapping of uid from newid.
>
> Here's step 1:
>
> . use actual, clear
> . keep uid
> . sort uid
> . by uid: keep if _n==1
>
> . set seed _______ <- fill this in with a
> random number
> . gen double random = uniform()
> . sort random
> . gen long newid = _n
>
> . sort uid
> . save mapping, replace
>
> New dataset mapping.dta contains two variables: uid and the
> corresponding
> newid. Next, we fix actual.dta for public consumption:
>
> . use actual
> . sort uid
> . merge uid using mapping
> . assert _merge==3
> . drop _merge uid
> . save actual, replace
>
> Finally, we put mapping.dta in a save place. I would write
> multiple copies
> of actual.dta on multiple CDs and put the CDs in multiple
> safes. Dataset
> mapping contains all the secret information.
>
> Dataset actual.dta no longer contains uid; it contains newid.
>
> -- Bill
> [email protected]
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/