Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Bootstrap to compare ROC area on imputed dataset
From
roland andersson <[email protected]>
To
[email protected]
Subject
Re: st: Bootstrap to compare ROC area on imputed dataset
Date
Fri, 18 Nov 2011 07:56:27 +0100
Thank you Cameron
Will read this.
In the meantime I hope for some help with Stata code (as I am not a
programmer). I must correct a typo - we would normally use roccomp
(not roctab) for the comparison of the ROC areas, but this does not
work with mim.
With roctab we can get the combined area over the imputed datasets,
and we can also use bootstrap for each imputed dataset, but we do not
know how to get the combined bootstrapped area over the imputed
datasets nor how to do the comparison.
I guess that a combination of mim, bootstrap and roctab must be possible.
Regards
Roland
2011/11/17 Cameron McIntosh <[email protected]>:
> Roland,
>
> You're asking for both specific Stata code and more general methodological guidance. I can try to take a bit of a crack at the latter. Bootstrapping in conjunction with imputation is quite intensive, although it can of course be done (after all, the two are similar in a number of ways):
>
> Efron, B. (1994). Missing Data, imputation, and the bootstrap. Journal of the American Statistical Association, 89(426), 463-475.
>
> Heymans, M.W., van Buuren, S., Knol, D.K., van Mechelen, M., & de Vet, H.C.W. (2007). Variable selection under multiple imputation using the bootstrap in a prognostic study. BMC Medical Research Methodology, 7:33.http://www.biomedcentral.com/content/pdf/1471-2288-7-33.pdf
>
> Kim, J.K., Brick, J.M., Fuller, W.A., & Kalton, G. (2006). On the bias of the multiple-imputation variance estimator in survey sampling. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(3), 509–521.
>
> Kim, J.K., & Rao, J.N.K. (2009). A unified approach to linearization variance estimation from survey data after imputation for item nonresponse. Biometrika, 96(4), 917-932.
>
> Davison, A.C., & Sardy, S. (2007). Resampling Variance Estimation in Surveys with Missing Data. Journal of Official Statistics, 23(3), 371–386.
>
> Eltinge, J.L. (1996). On Variance Estimation With Imputed Survey Data: Comment. Journal of the American Statistical Association, 91(434), 513-515.
>
> Efron, B. (1994). Missing Data, imputation, and the bootstrap. Journal of the American Statistical Association, 89(426), 463-475.
>
> Saigo, H., Shao, J., & Sitter, R.R. (2001). A Repeated HalfSample Bootstrap and Balanced Repeated Replications for Randomly Imputed Data. Survey Methodology, 27(2), 189-196.http://www.statcan.gc.ca/ads-annonces/12-001-x/6095-eng.pdf
>
> Shao, J., & Sitter, R.R. (1996). Bootstrap for imputed survey data. Journal of the American Statistical Association, 91, 12781288.
>
> Chen, J., Rao, J.N.K., & Sitter, R.R. (2000). Adjusted imputation for missing data in complex surveys. Statistics Sinica, 10, 11531169.
>
> Essentially, what you want is D=(AUC1-AUC2)/SE_b,
>
> where AUC1 and AUC2 are the original AUCs from the two models being compared, and SE_b is the standard error of the bootstrapped AUC differences. I don't imagine this would be very hard to program, perhaps in R if not in Stata. I think you would just bootstrap from each imputed data set so this would expand the number of replications as follows: k imputations * b bootstrap samples. You also definitely need to see (and you might want to try the empirical likelihood approach):
>
> Long, Q., Zhang, X., & Hsu, C.-H. (2011). Nonparametric multiple imputation for receiver operating characteristics analysis when some biomarker values are missing at random. Statistics in Medicine, Early View.http://onlinelibrary.wiley.com/doi/10.1002/sim.4338/abstract;jsessionid=63E100FD9A64CCB7B6C8E6D57CA08581.d01t02
>
> Liu, D., & Zhou, X.-H. (January 21, 2011). Semiparametric Estimation of the Covariate-Specific ROC Curve in Presence of Ignorable Verification Bias. UW Biostatistics Working Paper Series. Working Paper 374. Seattle, WA: University of Washington - Seattle Campus.http://www.bepress.com/cgi/viewcontent.cgi?article=1213&context=uwbiostat
>
> An, Y. (2011). Empirical Likelihood Confidence Intervals for ROC Curves with Missing Data. Mathematics Theses. Paper 95.http://digitalarchive.gsu.edu/math_theses/95
>
> Liu, X. (2010). Semi-Empirical Likelihood Confidence Intervals for the ROC Curve with Missing Data. Mathematics Theses. Paper 89.http://digitalarchive.gsu.edu/math_theses/89
>
> Janssen, K.J.M., Vergouwe, Y., Donders, A.R.T., Harrell, F.E., Jr., Chen, Q., Grobbee, D.E., & Moons, K.G.M. (2009). Dealing with Missing Predictor Values When Applying Clinical Prediction Models.Clinical Chemistry, 55, 994-1001.http://www.clinchem.org/cgi/reprint/55/5/994http://www.clinchem.org/cgi/content/full/clinchem.2008.115345/DC1
>
> Liu, D., & Zhou, X.-H. (2010). A model for adjusting for nonignorable verification bias in estimation of the ROC curve and its area with likelihood-based approach. Biometrics, 66(4), 1119-1128.
>
> Hope this helps,
>
> Cam
>
>>Date: Thu, 17 Nov 2011 18:04:56 +0100> Subject: st: Bootstrap to compare ROC area on imputed dataset> From: [email protected]
>> To: [email protected]
>>
>> We are analysing discriminating capacity of a clinical score. Because
>> of some missing values we had to use imputed dataset. We have now
>> constructed a new clinical score and want to compare the new with an
>> old, using bootstrap.
>>
>> We have used mim, category(combine) est(r(area)) se(r(se)) : roctab
>> diagnosis score1, summary to analyse the combined ROC area of the
>> imputed datasets. However we want to compare two different models and
>> would normally use roctab for this, but this does not work with mim,
>> category(combine).
>>
>> We also want to make a bootstrapped analysis of the diagnostic
>> properties of a new clinical score on the imputed dataset.
>>
>> We would appreciate any help on how to do the bootstrapping of the ROC
>> areas and comparing two areas on the imputed dataset.
>>
>> Regards
>>
>> Roland Andersson
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/