G. ter Riet <[email protected]>:
How about something like
webuse nhanes2, clear
ren sex v1
g v2=race==2 if !mi(race)
g v3=region==3 if !mi(race)
ren diabetes y
egen c=group(strata psu)
cap pr drop rocb
program rocb, rclass
logit `1' `2' `3' `4' [pw=finalwgt]
predict p
roctab `1' p
ret scalar rocb = r(area)
levelsof g, loc(gs)
foreach i of loc gs {
su p if g==`i'
ret scalar p`i'=r(mean)
}
drop p
eret clear
end
egen g=group(v1 v2 v3), label
levelsof g, loc(gs)
foreach i of loc gs {
loc p`i' "`:label (g) `i''"
}
loc r "r(rocb) r(p1) r(p2) r(p3) r(p4) r(p5) r(p6) r(p7) r(p8)"
bs `r', reps(200) strat(g) sav(p) cl(c): rocb y v1 v2 v3
use p, clear
forv i=1/8 {
la var _bs_`=`i'+1' "`p`i''"
}
d
su
On Fri, Oct 3, 2008 at 10:02 AM, G. ter Riet <[email protected]> wrote:
> Dear Statalisters,
> In medical research on prediction or diagnosis, we often use the bootstrap to calculate confidence intervals for an area under the curve, AUC, corresponding to a particular logistic regression model that is used for prediction of an event (e.g. death or some target illness). The AUC is a global measure for how well the model discriminates between those with an without an event. A program that does this might look as follows:
>
> capture program drop rocb
> program rocb, rclass
> logit `1' `2' `3' `4'
> predict p
> roctab `1' p
> drop p
> return scalar rocb = r(area)
> end
> bootstrap auc=r(rocb), reps(200): rocb depvar indepvar1 indepvar2 indepvar3
>
> What I should like to do, however, is to give readers of my paper an impression of the impact of bootstrapping, not just on the AUC, but on the distribution of predicted probabilities calculated from the logistic model since most clinicians are not that comfortable with AUCs of a ROC curve. Suppose, my indepvars are all binary. Then I'd have 2^3=8 covariate patterns and potentially a unique predicted probability for each covariate pattern for each bootstrap sample. For each covariate pattern, I'd like to average the predicted probabilities across the 200 samples (and perhaps say something about their variability).
> My programming abilities of Stata are not good enough to solve this efficiently. Any help I'd greatly appreciate. Of course any comments on whether you think the whole idea is worthwhile are welcome too.
> Cheers, Gerben ter Riet (epidemiologist, AMC, Dept General Practice, Amsterdam, NL)
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/