Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RES: generating a variable with pre-specified correlations with other two (given) variables
From
Richard Williams <[email protected]>
To
[email protected], [email protected]
Subject
Re: st: RES: generating a variable with pre-specified correlations with other two (given) variables
Date
Wed, 31 Aug 2011 08:46:32 -0500
At 07:00 AM 8/31/2011, Tirthankar Chakravarty wrote:
This question has appeared a few times before - in that you want to
create a variable with a pattern of correlation with _existing_
variables, which -corr2data- does not do. In an example where means
are normalised to zero, this can be had by solving a system of linear
equations in appropriate expectations.
Suppose you generate a variable as
Z = a*X+ b*Y ---(0)
where a, and b are constants to be determined. Then you can derive the
following identities under the zero mean assumption:
Cov(Z, X) = a*Var(X) + b*Cov(X, Y) ---(1)
Cov(Z, Y) = b*Var(Y) + a*Cov(X, Y) ---(2)
Here you know everything (you set Cov(Z, X) and Cov(Z, Y)), and this
is a system of two equations in two unknowns, a and b. Solve them and
generate your variables as in equation (0).
So for example, if I have Cov(X, Y) = .6, and Var(X)=Var(Y)=1, then a
=0.15625 , b=0.40625.
/************************************/
mat mCov = (1, .6\ .6, 1)
// generate x and y
corr2data x y, cstorage(full) cov(mCov) n(100000) clear
// generate z based on current sample of x and y
g z = .15625*x+.40625*y
corr, covariance
/************************************/
I am going to tweak your example a bit. Instead of doing the algebra
(and possibly screwing it up) let Stata do the work. Make mCov a
combo of the correlations you observe in your data and the
correlations you want for the new variable:
mat mCov = (1, .6, .4\ .6, 1, .5 \ .4, .5, 1)
corr2data x y z, cstorage(full) cov(mCov) n(100000) clear
reg z x y
Here are the regression results:
. reg z x y
Source | SS df MS Number of obs = 100000
-------------+------------------------------ F( 2, 99997) =18084.56
Model | 26562.2344 2 13281.1172 Prob > F = 0.0000
Residual | 73436.7656 99997 .734389687 R-squared = 0.2656
-------------+------------------------------ Adj R-squared = 0.2656
Total | 99998.9999 99999 .999999999 Root MSE = .85697
------------------------------------------------------------------------------
z | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | .15625 .0033875 46.13 0.000 .1496106 .1628894
y | .40625 .0033875 119.93 0.000 .3996106 .4128894
_cons | -1.06e-08 .00271 -0.00 1.000 -.0053115 .0053115
------------------------------------------------------------------------------
You could now do something like
gen newvar = .15625*realx + .40625 * realy
You can easily make this more complicated, e.g. include the standard
deviations and the means, add more Xs, etc. The -reg- command will do
all the algebra for you.
-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME: (574)289-5227
EMAIL: [email protected]
WWW: http://www.nd.edu/~rwilliam
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/