[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: Simulating multilevel data in Stata

From	"James Shaw" <[email protected]>
To	[email protected]
Subject	Re: st: RE: Simulating multilevel data in Stata
Date	Tue, 23 Dec 2008 12:58:14 -0600

Thanks for your prompt response.  The code you sent does a great job
at producing the desired correlations.  The only problem is that it
produces a variable u that is time-varying.  The variable u is meant
to represent a random effect.  As such, it should have a mean of 0 and
vary by subject only (i.e., be time-invariant).  The variables X1 and
X2 should vary by subject (i) and time (t).  The final product needs
to be a time-invariant u that is correlated 0.5 with time-varying X1
and 0.0 with time-varying X2.  Is there a way to modify the code to
get the desired correlations?

--
Jim



On Tue, Dec 23, 2008 at 11:15 AM, Feiveson, Alan H. (JSC-SK311)
<[email protected]> wrote:
> Jim -
>
> If I understand your question, you want to "fix" your randomly generated
> X1, X2 and u so that their sample covariance matrix exactly equals the
> one you want. Here's one way to do this (see below.
>
> Al Feiveson
>
>
> . matrix V = I(3)
> . matrix V[1,2]=.5
> . matrix V[2,1]=.5
> . matrix V[1,3]=.3
> . matrix V[3,1]=.3
>
> . matrix list V
>
> symmetric V[3,3]
>    c1  c2  c3
> r1   1
> r2  .5   1
> r3  .3   0   1
>
> . set obs 50
> obs was 0, now 50
>
> . drawnorm X1 X2 u,cov(V)
>
> . corr X1 X2 u,cov
> (obs=50)
>
>             |       X1       X2        u
> -------------+---------------------------
>          X1 |  1.01209
>          X2 |  .467516  .953286
>           u |  .571851  .073683  1.12376
>
> . matrix accum A = X1 X2 u,dev noc
> (obs=50)
>
> . matrix A=(1/49)*A
>
> . matrix list A
>
> symmetric A[3,3]
>           X1         X2          u
> X1  1.0120898
> X2  .46751642  .95328608
>  u   .5718511  .07368264   1.123759
>
> . matrix H=cholesky(V)
> . matrix G=cholesky(A)
> . matrix GI=inv(G)
>
> . matrix HGI=H*GI
>
> . matrix list HGI
>
> HGI[3,3]
>            X1          X2           u
> r1   .99400935           0           0
> r2   .03111956   1.0085584           0
> r3  -.34919894   .07784363    1.082162
>
> . des
>
> Contains data
>  obs:            50
>  vars:             3
>  size:           800 (99.9% of memory free)
> ------------------------------------------------------------------------
> ---------------------------
>              storage  display     value
> variable name   type   format      label      variable label
> ------------------------------------------------------------------------
> ---------------------------
> X1              float  %9.0g
> X2              float  %9.0g
> u               float  %9.0g
> ------------------------------------------------------------------------
> ---------------------------
> Sorted by:
>     Note:  dataset has changed since last saved
>
> . gen y1=HGI[1,1]*X1
> . gen y2=HGI[2,1]*X1+HGI[2,2]*X2
> . gen y3=HGI[3,1]*X1+HGI[3,2]*X2+HGI[3,3]*u
>
> . corr y*,cov
> (obs=50)
>
>             |       y1       y2       y3
> -------------+---------------------------
>          y1 |        1
>          y2 |       .5        1
>          y3 |       .3 -9.1e-10        1
>
>
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of James Shaw
> Sent: Tuesday, December 23, 2008 10:55 AM
> To: [email protected]
> Subject: st: Simulating multilevel data in Stata
>
> Dear Statalist members,
>
> I want to perform a simulation to show the inconsistency of the OLS and
> random effects estimators when one of the regressors is correlated with
> the unit-specific error component.  The specifics of the simulation are
> as follows;
>
> Y[i,t] (the outcome to be modeled) = b0 + b1*X1[i,t] + b2*X2[i,t] + u[i]
> + e[i,t]
>
> i = 1,...,500 indexes subjects
> t = 1,...,3 indexes time (repeated observations on subjects)
>
> X1 and X2 are normally distributed random variables with arbitrary means
> and variances u is a normally distributed subject-specific error
> component with mean of 0 and arbitrary variance e is a normally
> distributed random error component with mean of 0 and arbitrary variance
>
> corr(X1,X2) = 0.5
> corr(X1,u) = 0.3
> corr(X2,u) = 0.0
> corr(X1,e) = corr(X2,e) = corr(u,e) = 0.0
>
> b0, b1, and b2 are parameters to be specified in the simulation
>
>
> I have been unable to identify a method that will ensure that
> corr(X1,u) equals the desired value.  I tried the following method in
> which u was generated separately from X1 and X2 and cholesky
> decomposition was applied to generate transformations of the three
> random variables that would exhibit the desired correlations.
> However, this yielded a non-zero correlation between X2 and u.
>
> Method 1
> ***
> drop _all
> set obs 500
> gen n = _n
> gen u=invnorm(uniform())
> expand 3
> sort n
> gen n2 = _n
> gen t= (n2 - (n-1)*3)
> drawnorm x1 x2 e
> sort n
> mkmat x1 x2 u e, matrix(X)
> mat c =(1, .5, .3, 0 \ .5, 1, 0, 0 \ .3, 0, 1, 0 \ 0, 0, 0, 1) mat X2 =
> X*cholesky(c)
> ***
>
> A method that yielded somewhat better results involved generating X1,
> X2, u, and e with a pre-specified correlation matrix and then collapsing
> u so that it varied by subject only.  This provided the correct values
> for corr(X1,X2) and corr(X2,u) but attenuated the correlation between X1
> and u.  I presume that I could simply specify a higher value for
> corr(X1,u) when generating the variables so that the desired value would
> be achieved after u is collapsed.  However, this would not be the most
> elegant solution.
>
> Method 2
> ***
> drop _all
> set obs 500
> mat c =(1, .5, .3, 0 \ .5, 1, 0, 0 \ .3, 0, 1, 0 \ 0, 0, 0, 1) gen n =
> _n expand 3 sort n gen n2 = _n gen t= (n2 - (n-1)*3) drawnorm x1 x2 u e,
> corr(c) sort n by n: egen u2 = mean(u)
> ***
>
> Any suggestions or references would be appreciated.
>
> Regards,
>
> Jim
>
>
> James W. Shaw, Ph.D., Pharm.D., M.P.H.
> Assistant Professor
> Department of Pharmacy Administration
> College of Pharmacy
> University of Illinois at Chicago
> 833 South Wood Street, M/C 871, Room 252 Chicago, IL 60612
> Tel.: 312-355-5666
> Fax: 312-996-0868
> Mobile Tel.: 215-852-3045
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



-- 
James W. Shaw, Ph.D., Pharm.D., M.P.H.
Assistant Professor
Department of Pharmacy Administration
College of Pharmacy
University of Illinois at Chicago
833 South Wood Street, M/C 871, Room 252
Chicago, IL 60612
Tel.: 312-355-5666
Fax: 312-996-0868
Mobile Tel.: 215-852-3045
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: st: RE: Simulating multilevel data in Stata
  - From: "Feiveson, Alan H. (JSC-SK311)" <[email protected]>

References:
- st: Simulating multilevel data in Stata
  - From: "James Shaw" <[email protected]>
- st: RE: Simulating multilevel data in Stata
  - From: "Feiveson, Alan H. (JSC-SK311)" <[email protected]>

Prev by Date: st: Confidence interval for a median with weighted, clustered data
Next by Date: RE: st: RE: Simulating multilevel data in Stata
Previous by thread: st: RE: Simulating multilevel data in Stata
Next by thread: RE: st: RE: Simulating multilevel data in Stata
Index(es):
- Date
- Thread