Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RES: generating a variable with pre-specified correlations with other two (given) variables
From
Richard Williams <[email protected]>
To
[email protected], [email protected]
Subject
Re: st: RES: generating a variable with pre-specified correlations with other two (given) variables
Date
Wed, 31 Aug 2011 19:23:10 -0500
fjc, I'll trust your math, but it seems awfully
complicated to me. Why don't you just want to do something like
mat mCorr = (1, .6, .4\ .6, 1, .5 \ .4, .5, 1)
corr2data x y z, cstorage(full) corr(mCorr) n(100000) clear
corr
reg z x y
You can get the correlations (and also means &
sds) for the real x and y from your data set and
just plug them in, and then plug in the desired
correlations for x and y with z.
As a sidelight, any random variable u you
generate yourself will have a nonzero (albeit
small) correlation with x and y, because of
sampling variability. (Unless you generate u as
part of the corr2data command, in which case you
can force it to have 0 correlation with x and y.)
At 03:23 PM 8/31/2011, fjc wrote:
Hi,
Thank you all for the quick and useful responses.
1. I can do with covariances instead of correlations, so the methods
proposed by Tirthankar and Richard work fine.
2. Still, if I wanted to stick to correlations, I think one can apply
the same ideas (as suggested in the previous responses):
Let z be given by
(0) z = a * x + b * y + c * u,
where x and y are the two variables in the dataset and u is a
zero-mean random variable independent of x and y.
From (0) one gets:
(1) Corr(x,z) = a * sd(x)/sd(z) + b * sd(y)/sd(z) * Corr(x,y)
(2) Corr(y,z) = b * sd(y)/sd(z) + a * sd(x)/sd(z) * Corr(x,y)
(3) Var(z) = a^2 * Var(x) + b^2 * Var(y) + c^2
* Var(u) + 2 * a * b * Cov(x,y)
Once we have chosen Corr(x,z), Corr(y,z) and Var(z), we can solve the
system above for a, b, and c. Actually, equations (1) and (2) can be
solved for a and b to get:
a = [sd(z)/sd(x)] * [Corr(x,z) - Corr(x,y)*Corr(y,z)] / (1 - Corr(x,y)^2)
b = [sd(z)/sd(y)] * [Corr(y,z) - Corr(x,y)*Corr(x,z)] / (1 - Corr(x,y)^2)
Then we can use (3) to obtain the value of c.
Finally, we can use (0) to generate z.
Thanks again,
Francisco.
On Wed, Aug 31, 2011 at 3:59 PM, Richard Williams
<[email protected]> wrote:
> At 07:41 AM 8/31/2011, fjc wrote:
>>
>> Thanks, Tirthankar.
>>
>> This answers my question as originally posted.
>>
>> Now, something I didn't say in my earlier post (and I think I should
>> have) is that after I generate the new variable (z) I would like tow
>> run a regression of y on x and z. But if I generate z in the way you
>> propose, I will get perfect collinearity. żIs there any other way to
>> generate z without getting this collinearity?
>
> Slightly tweaking the earlier example, does this do what you want?
>
> mat mCorr = (1, .6, .4\ .6, 1, .5 \ .4, .5, 1)
> corr2data x y z, cstorage(full) corr(mCorr) n(100000) clear
> corr
> reg z x y
>
> Again, mCorr is a combo of the given correlations for x and y with the
> desired correlations for z. If you want, you can also specify standard
> deviations and means, both observed (for x and y) and desired (for z). I am
> faking all the data, although the
correlations etc. can come from real data.
> If you want to do some combo of fake and real (e.g. generate a z using the
> realx and realy) it can probably be done but would take a bit more work.
>
>
> -------------------------------------------
> Richard Williams, Notre Dame Dept of Sociology
> OFFICE: (574)631-6668, (574)631-6463
> HOME: Â (574)289-5227
> EMAIL: Â [email protected]
> WWW: Â Â http://www.nd.edu/~rwilliam
>
>
> *
> * Â For searches and help try:
> * Â http://www.stata.com/help.cgi?search
> * Â http://www.stata.com/support/statalist/faq
> * Â http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME: (574)289-5227
EMAIL: [email protected]
WWW: http://www.nd.edu/~rwilliam
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/