Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Generate correlated binary data for fixed rho
From
"Georgia Ntani" <[email protected]>
To
"[email protected]" <[email protected]>
Subject
st: Generate correlated binary data for fixed rho
Date
Fri, 23 Nov 2012 17:56:32 +0000
Dear Statalisters
I am trying to generate binary correlated data for fixed parameter b
(effect of covariates on the outcome), number and size of clusters, and
ICC. For this, I follow the algorithm suggested by Santos et al (Santos
et al. Estimating adjusted prevalence ratio in clustered cross-sectional
epidemiological data. BMC Medical research methodology). Below, is what I
am trying to do in algorithm steps.
1. Set 10 clusters with 30 observations per cluster
2. Generate a dichotomous independent variable X1j. This will be a
group-level variable (X1j=1 for the first 5 clusters and 0 for the last 5
clusters)
3. Generate a continuous independent variable X2ij from a Normal(0,1)
distribution. That will be an individual level variable
4. Generate a normal variable u0, such that for given cluster j,
uoj~N(0,sigma_u^2), where uoj and uoj' are independent for j different
from j'. The intraclass correlation coefficient (ICC) is defined as
(sigma_u^2 )/(sigma_u^2 +(pi^2 /3))
5. Generate probability of the outcome that will be from a random effects
logistic model
Pij=exp(X*b+u0)/(1+exp(X*b+u0))
6. Generate a binary outcome Yij with probability Pij
The Stata commands that I used to run the above are
set obs 300
gen cluster=1 if _n<=30
forvalues i=2(1)10 {
replace cluster=`i' if _n<=`i'*30 & cluster==.
}
gen x1=1 if cluster<=5
replace x1=0 if x1==.
generate x2 = invnorm(uniform())
preserve
/* From the formula above,if ICC=0.03 then sigma_u will be ///
sqrt(0.01*_pi^2)/0.97 */
di sqrt(0.01*_pi^2)/0.97
set seed 54325821
clear all
corr2data u01 u02 u03 u04 u05 u06 u07 u08 u09 u010, n(30) ///
means(0 0 0 0 0 0 0 0 0 0) sds(.319 .319 .319 ///
.319 .319 .319 .319 .319 .319 .319)
gen sn=_n
reshape long u0 , i(sn) j(cluster)
drop sn
sort cluster u0
save randomerror.dta, replace
restore
sort cluster
merge cluster using randomerror
drop _merge
gen prob=exp(x1*0.47+x2*0.3665+u0)/(1+exp(x1*0.47+x2*0.3665+u0))
gen u=runiform()
gen y = (u > prob)
The problem now comes when I run the command - xtlogit y i.x1 x2,
i(cluster) quad(30) or-- What I get is the following
Random-effects logistic regression Number of obs = 300
Group variable: cluster Number of groups = 10
Random effects u_i ~ Gaussian Obs per group: min = 30
avg = 30.0
max = 30
Wald chi2(2) = 14.95
Log likelihood = -199.56666 Prob > chi2 = 0.0006
------------------------------------------------------------------------------
y | OR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.x1 | .5943426 .1412417 -2.19 0.029 .373039 .9469334
x2 | .6737118 .081714 -3.26 0.001 .5311688 .8545073
-------------+----------------------------------------------------------------
/lnsig2u | -15.53268 280.1426 -564.6021 533.5368
-------------+----------------------------------------------------------------
sigma_u | .0004238 .0593568 2.5e-123 7.2e+115
rho | 5.46e-08 .0000153 1.9e-246 1
------------------------------------------------------------------------------
Likelihood-ratio test of rho=0: chibar2(01) = 0.00 Prob >= chibar2 = 1.000
My question is why I am getting a sigma_u of 0.0004? Aren't I supposed to
be getting sigma_u of 0.31 from the defined u0 (random error) above?
Any help on this would be highly appreciated
Many thanks
Georgia
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/