Jay Verkuilen wrote (excerpted):
David Airey wrote:
I have a typical pilot data set.<
A few things:
(1) Why not try xtmelogit? Can't do any harm to see what it does.
(2) Resampling methods seem tailor-made for this kind of problem.
Stata's got really impressive support for clustering and stratification
for both jackknifing and bootstrapping.
(3) On a problem like this, I'd give serious consideration to a fully
Bayesian model.
--------------------------------------------------------------------------------
Although resampling methods are very useful in other contexts, I've never
had much success with bootstrapping or jackknifing to overcome the inflation
of Type I error rates with mixed models applied to small samples. That's
why I suggested simulating the null distribution of the asymptotic test
statistic empirically. The do-file and its log below illustrates both of
these points. Bootstrapping with David's study parameters (six per group,
50 observations each, rho about 0.05) leaves the Type I error rate at about
what Dave found without bootstrapping, viz., double the 5% nominal.
Empirically generating the cumulative distribution of the likelihood-ratio
chi2 under these conditions, on the other hand, keeps the test size from
inflating (actually a little conservative). I suppose that this could also
be done using randomization (permutation) testing, but identifying
exchangeable observations might become problematic in a mixed model; you
might be better off going with a stratified nonparametric test, such as van
Elteren's, or just using a t test on a summary measure. Note that -xtlogit-
shows that we were successful at hitting the target rho in these cases (up
through at least a rho of 35%; it drops off at a target of 70%). I've had
less success attempting simulating with linear mixed models, mainly because
of the rather large and unpredictable downward bias in estimates of variance
components with unbalanced small samples, even using REML: getting in the
neighborhood of the variance components is a hit-or-miss affair.
Joseph Coveney
--------------------------------------------------------------------------------
. *
. * Resampling method
. *
[log redacted for brevity]
. summarize delta rho
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
delta | 300 .0199082 .286092 -.7870363 .8587052
rho | 300 .0348602 .0258169 2.03e-11 .134113
. foreach var of varlist p_* {
2. generate byte pos_`var' = `var' < 0.05
3. }
. summarize pos_*
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
pos_p_r | 300 .09 .2866599 0 1
pos_p_t | 300 .0533333 .2250728 0 1
. *
. * External empirical reference distribution method
. *
[log redacted for brevity]
. display in smcl as result `tmpnam0' / r(N)
.0200148
--------------------------------------------------------------------------------
clear *
set more off
set seed `=date("2008-03-13", "YMD")'
set seed0 `=date("2008-03-13", "YMD")'
*
capture program drop simem
program define simem, rclass
version 10
syntax [, Delta(real 0) Rho(real 0) VCE(passthru)]
drop _all
tempname p_t p_l
set obs 12
generate int pid = _n
generate byte trt = mod(_n, 2)
generate double mu = invlogit(trt * `delta' + ///
sqrt(`rho' / (1 - `rho') * _pi^2 / 3) * invnormal(uniform()))
assert !missing(mu) // if rho is misspecified
generate byte den = 50
rndbinx mu den
*
/* generate double pos_prim = bnlx
replace pos_prim = 50 - 1 / 2 / 50 if pos_prim == 50
replace pos_prim = 1 / 2 / 50 if !pos_prim
generate double logit_pi = logit(pos_prim / den) */
ttest bnlx, by(trt)
scalar define `p_t' = r(p)
/* ttest logit_pi, by(trt)
scalar define `p_l' = r(p) */
*
rename bnlx count1
generate byte count0 = den - count1
reshape long count, i(pid) j(positive)
drop if !count
expand count
xtlogit positive trt, i(pid) intpoints(30) `vce'
return scalar rho = e(rho)
return scalar p_r = 2 * normal(-abs(_b[trt] / _se[trt]))
return scalar chi2 = e(chi2)
return scalar delta = _b[trt]
return scalar p_t = `p_t'
* return scalar p_l = `p_l'
end
*
capture log close
log using btestm.smcl, smcl replace
*
* Resampling method
*
quietly simulate delta = r(delta) rho = r(rho) p_r = r(p_r) ///
p_t = r(p_t) chi2 = e(chi2), reps(300): ///
simem , rho(0.05) vce(bootstrap)
summarize delta rho
foreach var of varlist p_* {
generate byte pos_`var' = `var' < 0.05
}
summarize pos_*
*
* External empirical reference distribution method
*
keep chi2
tempfile tmpfil0
tempname tmpnam0 tmpnam1
quietly save `tmpfil0'
quietly simulate reference_chi2 = e(chi2), reps(300): simem , ///
rho(0.05)
quietly centile reference_chi2, centile(95)
scalar define `tmpnam0' = r(c_1)
quietly use `tmpfil0', clear
quietly count if chi2 >= `tmpnam0'
scalar define `tmpnam1' = r(N)
quietly count
display in smcl as result `tmpnam0' / r(N)
log close
exit
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/