Thanks for your prompt response. The code you sent does a great job
at producing the desired correlations. The only problem is that it
produces a variable u that is time-varying. The variable u is meant
to represent a random effect. As such, it should have a mean of 0 and
vary by subject only (i.e., be time-invariant). The variables X1 and
X2 should vary by subject (i) and time (t). The final product needs
to be a time-invariant u that is correlated 0.5 with time-varying X1
and 0.0 with time-varying X2. Is there a way to modify the code to
get the desired correlations?
--
Jim
On Tue, Dec 23, 2008 at 11:15 AM, Feiveson, Alan H. (JSC-SK311)
<[email protected]> wrote:
> Jim -
>
> If I understand your question, you want to "fix" your randomly generated
> X1, X2 and u so that their sample covariance matrix exactly equals the
> one you want. Here's one way to do this (see below.
>
> Al Feiveson
>
>
> . matrix V = I(3)
> . matrix V[1,2]=.5
> . matrix V[2,1]=.5
> . matrix V[1,3]=.3
> . matrix V[3,1]=.3
>
> . matrix list V
>
> symmetric V[3,3]
> c1 c2 c3
> r1 1
> r2 .5 1
> r3 .3 0 1
>
> . set obs 50
> obs was 0, now 50
>
> . drawnorm X1 X2 u,cov(V)
>
> . corr X1 X2 u,cov
> (obs=50)
>
> | X1 X2 u
> -------------+---------------------------
> X1 | 1.01209
> X2 | .467516 .953286
> u | .571851 .073683 1.12376
>
> . matrix accum A = X1 X2 u,dev noc
> (obs=50)
>
> . matrix A=(1/49)*A
>
> . matrix list A
>
> symmetric A[3,3]
> X1 X2 u
> X1 1.0120898
> X2 .46751642 .95328608
> u .5718511 .07368264 1.123759
>
> . matrix H=cholesky(V)
> . matrix G=cholesky(A)
> . matrix GI=inv(G)
>
> . matrix HGI=H*GI
>
> . matrix list HGI
>
> HGI[3,3]
> X1 X2 u
> r1 .99400935 0 0
> r2 .03111956 1.0085584 0
> r3 -.34919894 .07784363 1.082162
>
> . des
>
> Contains data
> obs: 50
> vars: 3
> size: 800 (99.9% of memory free)
> ------------------------------------------------------------------------
> ---------------------------
> storage display value
> variable name type format label variable label
> ------------------------------------------------------------------------
> ---------------------------
> X1 float %9.0g
> X2 float %9.0g
> u float %9.0g
> ------------------------------------------------------------------------
> ---------------------------
> Sorted by:
> Note: dataset has changed since last saved
>
> . gen y1=HGI[1,1]*X1
> . gen y2=HGI[2,1]*X1+HGI[2,2]*X2
> . gen y3=HGI[3,1]*X1+HGI[3,2]*X2+HGI[3,3]*u
>
> . corr y*,cov
> (obs=50)
>
> | y1 y2 y3
> -------------+---------------------------
> y1 | 1
> y2 | .5 1
> y3 | .3 -9.1e-10 1
>
>
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of James Shaw
> Sent: Tuesday, December 23, 2008 10:55 AM
> To: [email protected]
> Subject: st: Simulating multilevel data in Stata
>
> Dear Statalist members,
>
> I want to perform a simulation to show the inconsistency of the OLS and
> random effects estimators when one of the regressors is correlated with
> the unit-specific error component. The specifics of the simulation are
> as follows;
>
> Y[i,t] (the outcome to be modeled) = b0 + b1*X1[i,t] + b2*X2[i,t] + u[i]
> + e[i,t]
>
> i = 1,...,500 indexes subjects
> t = 1,...,3 indexes time (repeated observations on subjects)
>
> X1 and X2 are normally distributed random variables with arbitrary means
> and variances u is a normally distributed subject-specific error
> component with mean of 0 and arbitrary variance e is a normally
> distributed random error component with mean of 0 and arbitrary variance
>
> corr(X1,X2) = 0.5
> corr(X1,u) = 0.3
> corr(X2,u) = 0.0
> corr(X1,e) = corr(X2,e) = corr(u,e) = 0.0
>
> b0, b1, and b2 are parameters to be specified in the simulation
>
>
> I have been unable to identify a method that will ensure that
> corr(X1,u) equals the desired value. I tried the following method in
> which u was generated separately from X1 and X2 and cholesky
> decomposition was applied to generate transformations of the three
> random variables that would exhibit the desired correlations.
> However, this yielded a non-zero correlation between X2 and u.
>
> Method 1
> ***
> drop _all
> set obs 500
> gen n = _n
> gen u=invnorm(uniform())
> expand 3
> sort n
> gen n2 = _n
> gen t= (n2 - (n-1)*3)
> drawnorm x1 x2 e
> sort n
> mkmat x1 x2 u e, matrix(X)
> mat c =(1, .5, .3, 0 \ .5, 1, 0, 0 \ .3, 0, 1, 0 \ 0, 0, 0, 1) mat X2 =
> X*cholesky(c)
> ***
>
> A method that yielded somewhat better results involved generating X1,
> X2, u, and e with a pre-specified correlation matrix and then collapsing
> u so that it varied by subject only. This provided the correct values
> for corr(X1,X2) and corr(X2,u) but attenuated the correlation between X1
> and u. I presume that I could simply specify a higher value for
> corr(X1,u) when generating the variables so that the desired value would
> be achieved after u is collapsed. However, this would not be the most
> elegant solution.
>
> Method 2
> ***
> drop _all
> set obs 500
> mat c =(1, .5, .3, 0 \ .5, 1, 0, 0 \ .3, 0, 1, 0 \ 0, 0, 0, 1) gen n =
> _n expand 3 sort n gen n2 = _n gen t= (n2 - (n-1)*3) drawnorm x1 x2 u e,
> corr(c) sort n by n: egen u2 = mean(u)
> ***
>
> Any suggestions or references would be appreciated.
>
> Regards,
>
> Jim
>
>
> James W. Shaw, Ph.D., Pharm.D., M.P.H.
> Assistant Professor
> Department of Pharmacy Administration
> College of Pharmacy
> University of Illinois at Chicago
> 833 South Wood Street, M/C 871, Room 252 Chicago, IL 60612
> Tel.: 312-355-5666
> Fax: 312-996-0868
> Mobile Tel.: 215-852-3045
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
--
James W. Shaw, Ph.D., Pharm.D., M.P.H.
Assistant Professor
Department of Pharmacy Administration
College of Pharmacy
University of Illinois at Chicago
833 South Wood Street, M/C 871, Room 252
Chicago, IL 60612
Tel.: 312-355-5666
Fax: 312-996-0868
Mobile Tel.: 215-852-3045
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/