|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: group size needed for mixed models (binary response)
Susan,
I would say your doubts are well founded. There aren't really
enough fawns/doe to estimate a mixed model. I often run into
this problem with hospital admissions, where the same patient
can be admitted twice in 12 months for the same condition, but
most patients have just one admission.
First, you should estimate the intra-class correlation of survival
for each of your study groups (in Stata use -loneway-). If this
is not much different from zero, then you can ignore the grouping
and use standard logit.
If there is a group effect, then my instinct would be to select
a random fawn per mother as the analytic sample. If there is
reason to think that number of siblings might affect survival,
you can add a covariate for each retained fawn equal to the
number of siblings. Then again use standard logit, because you
are working with only one fawn/doe.
Hope this helps,
Jeph
Susan Lingle wrote:
Dear Stata-Listers
My question is a statistical one, not anything specific to use of Stata.
From reading the archives, there are clearly many knowledgeable people
out there, and I am hoping someone can advise whether a mixed model is
appropriate to use to analye my data.
I have a data set for deer fawns, in which I want to test whether fawns
of one species, whitetails, are more likely to die from predation during
summer than mule deer. I plan to run a separate analysis to test whether
the other species, mule deer are more likely than whitetails to die
during winter. For the summer sample, there are 129 whitetail fawns from
124 mothers and 207 mule deer from 177 mothers. For the winter sample,
there are 26 whitetail fawns from 25 mothers, and 129 mule deer from 103
mothers. This means there is one measurement (live or die) for each fawn.
Someone strongly recommended that I use a mixed model with the mother’s
identity as a random factor to analyse the survival data (e.g.,
xtmelogit in Stata). I certainly appreciate the value of including
family effects as random factors when there is a large enough family to
estimate those effects, or the variance associated with those effects.
But in this case, most females have one fawn so the data appear
insufficient to estimate random effects or the variance, and I believe
the latter is needed to estimate an intercept.
I have searched far and wide for an answer. The closest thing I found,
and it seems to make sense, is an article suggesting that a large group
size (n=50) as well as a large number of groups (n=100) are needed for a
mixed effects logistic regression to produce decent estimates of fixed
effects as well as random effects (citation below). They found severe
flaws when group size was less than 5. Apparently, the sample size
issues are not as restrictive for linear models, although I get the
impression one still would need more than n=5 for each group.
It is appropriate to use mixed models for binary response variables, or
even for linear response variables, when the groups usually consist of 1
individuals and at most 2 individuals???
Can anyone advise? It would be greatly appreciated.
Susan
Article: R. Moineddin et al 2007. A simulation study of sample size for
multilevel logistic regression models. BMC Medical Research Methodology
7:34.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/