
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: AW: utility to create fake dataset?

From   Daljit Dhadwal <[email protected]>
To   [email protected]
Subject   Re: st: AW: utility to create fake dataset?
Date   Sun, 8 Nov 2009 10:06:23 -0800

It sounds like youre trying to create anonymized data sets.  There
are lots of different names for the techniques for doing this: data
masking, data anonymization, data obfuscation, data de-identification,
data depersonalization, data scrubbing, and data scrambling.

Here’s the Wikipedia article on data masking:

Here’s a good powerpoint presentation that discusses some of the
techniques used in data masking:



On Sun, Nov 8, 2009 at 9:32 AM, Martin Weiss <[email protected]> wrote:
> <>
> *************
> h clonevar
> *************
> comes to mind...
> Martin
> -----Ursprüngliche Nachricht-----
> Von: [email protected]
> [mailto:[email protected]] Im Auftrag von Jeph Herrin
> Gesendet: Sonntag, 8. November 2009 18:20
> An: [email protected]
> Betreff: st: utility to create fake dataset?
> I sometimes need to create a "fake" dataset that "looks?
> like an existing dataset. For example, a dataset that
> must, for health privacy reasons, remain on a remote server,
> and I would like to develop code locally to run on it.
> Or, I need to make mock tables to share with colleagues
> who need to remain blinded for now to actual study data.
> Usually, I just do something that seems "good enough", like
> sample 5%, expand 20, replace values with random values, etc.
> Or, in an extreme case, set obs to be twice the existing obs
> and keep the ones with missing data. But the first is not
> very satisfying when I need to reassure higher powers that
> I have a "dummy" dataset, and the second is not very helpful
> for writing final useable code.
> So, I'm thinking I'll write a utility to create a 'dummy'
> dataset from an existing dataset, but wondered if there was
> something out there already. Perhaps there is even a well
> established name for this process? My searches for "dummy"
> and "fake" dataset have not been fruitful.
> thanks,
> Jeph
> *
> *   For searches and help try:
> *
> *
> *
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index