Thank you Joseph. I appreciate your assistance very
much. Thank you not only for your valuable comments,
but also for your patience.
Ricardo.
--- Joseph Coveney <[email protected]> wrote:
> Ricardo Ovaldia wrote:
>
> > I am a bit baffled by the assertion that 50
> clusters
> > and 410 observations is a small sample size. I
> know is
> > not big, but I would not consider it small either.
>
> Whether 50 clusters and 410 total observations is
> small or not depends upon
> the task. Advocating exercising caution to assure
> that the sample size is
> adequate for the intended purpose is not asserting
> that a particular sample
> size is small. For population-average GEE, which is
> sensitive to cluster
> numbers, rules of thumb for sample size for ranges
> of predictors are given
> in M. E. Stokes, C. S. Davis & G. G. Koch,
> _Categorical Data Analysis Using
> the SAS System_ Second Edition. (Cary: N. Carolina:
> SAS Institute, 2000),
> p. 479. If you have many candidate predictors among
> those for patients and
> physicians, my guess is that the authors would say
> that 50 clusters is
> pretty dicey.
>
> I don't recall having recently run accross any
> corresponding guidance for
> random-effects logistic regression, which depends
> more upon within-cluster
> correlation and total observations. Can -simulate-
> tell you about the
> adequacy of the sample size for your purposes (e.g.,
> for confidence interval
> coverage) in your particular dataset with the
> parameters set at their
> estimates? Generating a correlated binary variate
> to match the observed rho
> is tough, but you might be able to get reasonably
> close. If you're
> satisfied with the results of the simulation for the
> model's intended use,
> then the sample size is not too small.
>
> In a simple-minded illustration below, a sample size
> of 50 clusters, a
> uniform length (cluster size) of six observations
> and a moderate-to-high
> within-cluster correlation (rho is about 80% or so),
> the test size was 11.5%
> at the nominal 5% level of Type 1 error rate.
> That's more than double the
> nominal, and if the purpose is hypothesis testing,
> then the sample size
> would be considered small, too small given the
> nature of the data and the
> objective. This improves, of course, when there is
> no within-cluster
> correlation--in the simple example below it reduces
> to 6.7%, which is still
> substantially larger than nominal. But if this
> isn't critical for the
> objective, then the sample then would not
> necessarily be considered small.
>
> > The question posed in this phase of analysis is
> rather
> > simple: Which physician and patient
> characteristics
> > are important in predicting patient referral?
>
> Have you considered coupling modeling with graphical
> analysis at this phase?
> Strength and nature of the relationships observed
> graphically could be
> combined with knowledge of the subject matter to
> judge importance of
> predictors. Plots could be made of observations or
> of predictions from
> models after holding one or more covariates at
> reference values. If your
> audience doesn't feel comfortable judging the
> strength or importance of the
> relationship based upon what they can see by
> graphical presentation, then
> numerical description of the predictions can be done
> either with summary
> statistics (including tabulations) or by a model,
> perhaps with standardized
> coefficients if that makes it easier for your
> audience. For the next phase,
> the model can be made parsimonious based upon what's
> observed in the plots
> or what's judged unimportant in earlier stages of
> exploration. It might be
> beneficial to use two models to describe your
> observations: one, a
> conditional logistic regression with physicians as
> groups, to describe
> patient characteristics that predict referral; the
> other, a count model, to
> describe physician characteristics that predict
> referral rates.
>
> Joseph Coveney
>
>
----------------------------------------------------------------------------
>
> clear
> set more off
> set seed 20040809
> set obs 6
> forvalues i = 1/6 {
> generate float rho`i' = 0.8
> replace rho`i' = 1 in `i'
> }
> mkmat rho*, matrix(A)
> *
> program define xtlogitsimc, rclass
> version 8.2
> drawnorm dep1 dep2 dep3 dep4 dep5 dep6, corr(A)
> n(50) clear
> generate byte pid = _n
> generate byte trt = _n > _N / 2
> reshape long dep, i(pid) j(tim)
> replace dep = dep > 0
> compress
> xi: xtlogit dep trt i.tim, i(pid) re
> estimates store A
> xtlogit dep, i(pid) re
> estimates store B
> lrtest A B
> return scalar p = r(p)
> end
> *
> simulate "xtlogitsimc" p = r(p), reps(1000)
> generate byte pos = p < 0.05
> replace pos = . if p >= .
> summarize pos
> *
> *
> *
> program define xtlogitsimi, rclass
> version 8.2
> replace dep = uniform() > 0.5
> xi: xtlogit dep trt i.tim, i(pid) re
> estimates store A
> xtlogit dep, i(pid) re
> estimates store B
> lrtest A B
> return scalar p = r(p)
> estimates drop _all
> end
> *
> clear
> set obs 50
> generate byte pid = _n
> generate byte trt = _n > _N / 2
> forvalues i = 1/6 {
> generate byte dep`i' = .
> }
> reshape long dep, i(pid) j(tim)
> simulate "xtlogitsimi" p = r(p), reps(1000)
> generate byte pos = p < 0.05
> replace pos = . if p >= .
> summarize pos
> exit
>
>
>
>
>
> *
> * For searches and help try:
> *
> http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
=====
Ricardo Ovaldia, MS
Statistician
Oklahoma City, OK
__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - 100MB free storage!
http://promotions.yahoo.com/new_mail
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/