David Airey wrote:
I have a typical pilot data set. Small. I have 12 mice, 6 from one
group, 6 from another. For each mouse I have about 50 yes/no scores.
50 was enough to get precision on a given mouse. I'm interested in the
group difference. In the past I used xtgee with mouse as the i(mouse)
option, i.e., mouse as the cluster, with family(binomial) and
link(logit). But previously, I did this with double the number of
mice, so I felt I had enough clusters. Here, I am feeling
uncomfortable about the number of clusters (12). The same goes for
xtmelogit, new in Stata 10. Given the number of mice, is it better to
simply transform the summary statistics for animals (mean of the yes/
no vector by animal), like with an arcsine or logit, and use a ttest?
--------------------------------------------------------------------------------
Unless the intraclass correlation is high, the distribution of sums of 50
binary observations per mouse should look quite normal-like, even with
expectations in the neighborhood of 10% or 90%. Also, the t test is
supposed to be reasonably robust, absent skew.
It looks like the t test would work with untransformed sum scores of 50
observations for each of 12 mice within a reasonable range of intraclass
correlations (see simulation do-file below). The simulation compares
untransformed sum scores to logit-transformed mean scores (with the same
fix-up for floors and ceilings as used in the arcsine-of-square-root
tranform). As you might expect from the fix-up moving everything toward the
middle, untransformed sum scores comes out ahead in power. Both maintain
Type I error rate well.
It would be interesting to see how Jeff Herrin's -cltest- would fare by
comparison, but it doesn't seem to return scalars for -simulate- to grab
onto.
Joseph Coveney
clear *
set more off
set seed `=date("2008-03-11", "YMD")'
set seed0 `=date("2008-03-11", "YMD")'
*
capture program drop simem
program define simem, rclass
version 10
syntax [, delta(real 0) s(real 0)]
tempname p_t
drop _all
set obs 12
generate byte mouse = _n
generate byte trt = mod(_n, 2)
generate double mu = 0.5 + trt * `delta' + ///
`s' * _pi / sqrt(3) * invnormal(uniform())
generate byte den = 50
rndbinx mu den
generate double pos_prim = bnlx
replace pos_prim = 50 - 1 / 2 / 50 if pos_prim == 50
replace pos_prim = 1 / 2 / 50 if !pos_prim
generate double logit_pi = logit( pos_prim / den)
ttest bnlx, by(trt)
scalar define `p_t' = r(p)
ttest logit_pi, by(trt)
return scalar p_l = r(p)
return scalar p_t = `p_t'
end
*
* Test size
*
forvalues s = 0.1(0.1)0.5 {
display in smcl as text _newline(1) "Null hypothesis; s = `s'"
quietly simulate t = r(p_t) l = r(p_l), reps(300) nodots: simem , s(`s')
foreach var of varlist t l {
generate byte pos_`var' = `var' < 0.05
}
summarize pos_*
}
*
* Relative power
*
forvalues s = 0.1(0.1)0.5 {
forvalues delta = 0.1(0.05)0.2 {
display in smcl as text _newline(1) "Delta = `delta'; s = `s'"
quietly simulate t = r(p_t) l = r(p_l), reps(300) nodots: ///
simem , s(`s') delta(`delta')
foreach var of varlist t l {
generate byte pos_`var' = `var' < 0.05
}
summarize pos_*
}
}
exit
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/