Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Adding randomness to a variable
From
Richard Williams <[email protected]>
To
[email protected], [email protected]
Subject
Re: st: Adding randomness to a variable
Date
Mon, 21 Oct 2013 11:41:20 -0500
At 10:04 AM 10/21/2013, Owen Gallupe wrote:
Hi,
Given the random number generator capabilities of Stata, I suspect
there is an easy solution to this which I just haven't managed to
track down. Having said that, is there any function that allows you to
take an existing variable and add a small degree of randomness to it?
I'm thinking along the lines of a jitter option when generating a
variable. I know that this exact command doesn't actually exist, but a
command of the following form is what I'm looking for:
gen varx = jitter(var)
My idea is that it would take this:
5
6
7
8
9
And turn it into something like this:
4.73
6.11
6.80
8.34
9.09
I'm aware that the following two options would produce something
similar, but my idea is to manually create a variable that has the
exact properties I want for teaching purposes but then add a little
"error" to it.
a)
gen varx = .5*var1 + .8660254*var2
b)
clear
matrix c = (1.00, 0.30, -0.25, -0.10, 0.10, 0.20 \ ///
0.30, 1.00, -0.15, -0.10, 0.12, 0.35 \ ///
-0.25, -0.15, 1.00, 0.13, -0.08, -0.16 \ ///
-0.10, -0.10, 0.13, 1.00, 0.06, -0.14 \ ///
0.10, 0.12, -0.08, 0.06, 1.00, 0.001 \ ///
0.20, 0.35, -0.16, -0.14, 0.001, 1.00)
corr2data var1 var2 var3 var4 var5 var6, n(2000) corr(c)
I've used the corr2data approach to create vars like e1 and e2 that
were uncorrelated with anything else, and then added them to the
other vars I had created. See (especially page 2)
http://www3.nd.edu/~rwilliam/stats2/l21.pdf
For existing data you can also do stuff like
gen x2 = x + rnormal()
That will add random noise to x; but corr2data is better if you want
EXACT properties, e.g. by chance alone the randomnness you add above
could be/ should be slightly correlated with the original x.
Instead of corr2data, consider using drawnorm if you want to be
sampling from a population with known properties, rather than
creating a population with the exact properties.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/