[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: ttest or xtmelogit?

From	"Joseph Coveney" <[email protected]>
To	"Statalist" <[email protected]>
Subject	st: RE: ttest or xtmelogit?
Date	Fri, 14 Mar 2008 11:57:09 +0900

Jay Verkuilen wrote (excerpted):

David Airey wrote:

I have a typical pilot data set.<

A few things:

(1) Why not try xtmelogit? Can't do any harm to see what it does.

(2) Resampling methods seem tailor-made for this kind of problem.
Stata's got really impressive support for clustering and stratification
for both jackknifing and bootstrapping.

(3) On a problem like this, I'd give serious consideration to a fully
Bayesian model.

--------------------------------------------------------------------------------

Although resampling methods are very useful in other contexts, I've never
had much success with bootstrapping or jackknifing to overcome the inflation
of Type I error rates with mixed models applied to small samples.  That's
why I suggested simulating the null distribution of the asymptotic test
statistic empirically.  The do-file and its log below illustrates both of
these  points.  Bootstrapping with David's study parameters (six per group,
50 observations each, rho about 0.05) leaves the Type I error rate at about
what Dave found without bootstrapping, viz., double the 5% nominal.
Empirically generating the cumulative distribution of the likelihood-ratio
chi2 under these conditions, on the other hand, keeps the test size from
inflating (actually a little conservative).  I suppose that this could also
be done using randomization (permutation) testing, but identifying
exchangeable observations might become problematic in a mixed model; you
might be better off going with a stratified nonparametric test, such as van
Elteren's, or just using a t test on a summary measure.  Note that -xtlogit-
shows that we were successful at hitting the target rho in these cases (up
through at least a rho of 35%; it drops off at a target of 70%).  I've had
less success attempting simulating with linear mixed models, mainly because
of the rather large and unpredictable downward bias in estimates of variance
components with unbalanced small samples, even using REML:  getting in the
neighborhood of the variance components is a hit-or-miss affair.

Joseph Coveney

--------------------------------------------------------------------------------

. *
. * Resampling method
. *
[log redacted for brevity]

. summarize delta rho

   Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
      delta |       300    .0199082     .286092  -.7870363   .8587052
        rho |       300    .0348602    .0258169   2.03e-11    .134113

. foreach var of varlist p_* {
 2.     generate byte pos_`var' = `var' < 0.05
 3. }

. summarize pos_*

   Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
    pos_p_r |       300         .09    .2866599          0          1
    pos_p_t |       300    .0533333    .2250728          0          1

. *
. * External empirical reference distribution method
. *
[log redacted for brevity]

. display in smcl as result `tmpnam0' / r(N)
.0200148

--------------------------------------------------------------------------------

clear *
set more off
set seed `=date("2008-03-13", "YMD")'
set seed0 `=date("2008-03-13", "YMD")'
*
capture program drop simem
program define simem, rclass
  version 10
  syntax [, Delta(real 0) Rho(real 0) VCE(passthru)]
   drop _all
   tempname p_t p_l
   set obs 12
   generate int pid = _n
   generate byte trt = mod(_n, 2)
   generate double mu = invlogit(trt * `delta' + ///
     sqrt(`rho' / (1 - `rho') * _pi^2 / 3) * invnormal(uniform()))
   assert !missing(mu) // if rho is misspecified
   generate byte den = 50
   rndbinx mu den
*
/*    generate double pos_prim = bnlx
   replace pos_prim = 50 - 1 / 2 / 50 if pos_prim == 50
   replace pos_prim = 1 / 2 / 50 if !pos_prim
   generate double logit_pi = logit(pos_prim / den) */
   ttest bnlx, by(trt)
   scalar define `p_t' = r(p)
/*    ttest logit_pi, by(trt)
   scalar define `p_l' = r(p) */
*
   rename bnlx count1
   generate byte count0 = den - count1
   reshape long count, i(pid) j(positive)
   drop if !count
   expand count
   xtlogit positive trt, i(pid) intpoints(30) `vce'
   return scalar rho = e(rho)
   return scalar p_r = 2 * normal(-abs(_b[trt] / _se[trt]))
   return scalar chi2 = e(chi2)
   return scalar delta = _b[trt]
   return scalar p_t = `p_t'
*    return scalar p_l = `p_l'
end
*
capture log close
log using btestm.smcl, smcl replace
*
* Resampling method
*
quietly simulate delta = r(delta) rho = r(rho) p_r = r(p_r) ///
         p_t = r(p_t) chi2 = e(chi2), reps(300): ///
         simem , rho(0.05) vce(bootstrap)
summarize delta rho
foreach var of varlist p_* {
   generate byte pos_`var' = `var' < 0.05
}
summarize pos_*
*
* External empirical reference distribution method
*
keep chi2
tempfile tmpfil0
tempname tmpnam0 tmpnam1
quietly save `tmpfil0'
quietly simulate reference_chi2 = e(chi2), reps(300): simem , ///
 rho(0.05)
quietly centile reference_chi2, centile(95)
scalar define `tmpnam0' = r(c_1)
quietly use `tmpfil0', clear
quietly count if chi2 >= `tmpnam0'
scalar define `tmpnam1' = r(N)
quietly count
display in smcl as result `tmpnam0' / r(N)
log close
exit


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- st: RE: RE: ttest or xtmelogit?
  - From: "Verkuilen, Jay" <[email protected]>

Prev by Date: st: imputation
Next by Date: st: RE: RE: ttest or xtmelogit?
Previous by thread: st: population weigted data and limited dependent variable first stageequation
Next by thread: st: RE: RE: ttest or xtmelogit?
Index(es):
- Date
- Thread