On the optimism correction of the area under the receiver operating characteristic curve in logistic prediction models

Amaia Iparragirre Letamendia; Irantzu Barrio Beraza; María Xosé Rodríguez Álvarez

Ayuda

On the optimism correction of the area under the receiver operating characteristic curve in logistic prediction models

Autores: Amaia Iparragirre Letamendia, Irantzu Barrio Beraza , María Xosé Rodríguez Álvarez
Localización: Sort: Statistics and Operations Research Transactions, ISSN 1696-2281, Vol. 43, Nº. 1, 2019, págs. 145-162
Idioma: inglés
DOI: 10.2436/20.8080.02.82
Enlaces
- Texto completo (pdf)
Resumen
- When the same data are used to fit a model and estimate its predictive performance, this estimate may be optimistic, and its correction is required. The aim of this work is to compare the behaviour of different methods proposed in the literature when correcting for the optimism of the estimated area under the receiver operating characteristic curve in logistic regression models. A simulation study (where the theoretical model is known) is conducted considering different number of covariates, sample size, prevalence and correlation among covariates. The results suggest the use of k-fold cross-validation with replication and bootstrap.
Referencias bibliográficas
- Airola, A., Pahikkala, T., Waegeman, W., De Baets, B., and Salakoski, T. (2011). An experimental comparison of cross-validation techniques...
- Austin, P.C. and Steyerberg, E.W. (2017). Events per variable (EPV) and the relative performance of different strategies for estimating the...
- Bamber, D. (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical...
- Bradley, A.P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30, 1145–1159.
- Copas, J. and Corbett, P. (2002). Overestimation of the receiver operating characteristic curve for logistic regression. Biometrika, 89, 315–331.
- Efron, B. (1983). Estimating the error rate of a prediction rule: improvement on cross-validation. Journal of the American Statistical Association,...
- Efron, B. (1986). How biased is the apparent error rate of a prediction rule? Journal of the American Statistical Association, 81, 461–470.
- Efron, B. and Tibshirani, R.J. (1993). An Introduction to the Bootstrap. New York: Chapman & Hall/CRC.
- Garcia-Gutierrez, S., Quintana, J.M., AntoÌn-Ladislao, A., Gallardo, M.S., Pulido, E., Rilo, I., Zubillaga, E., Morillas, M., Onaindia, J.J.,...
- Hanley, J.A. and McNeil, B.J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143,...
- Harrell, F.E. (2001). Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis....
- Harrell, F.E., Lee, K.L. and Mark, D.B. (1996). Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy,...
- Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning. Springer Series in Statistics. Springer New York...
- Hosmer, D.W. and Lemeshow, S. (2000). Applied Logistic Regression. New York, N.Y.: Wiley.
- Lachenbruch, P.A. and Mickey, M.R. (1968). Estimation of error rates in discriminant analysis. Technometrics, 10, 1–11.
- McCullagh, P. and Nelder, J.A. (1989). Generalized Linear Models, 2nd ed. London: Chapman & Hall/CRC.
- Parker, B.J., GuÌnter, S. and Bedo, J. (2007). Stratification bias in low signal microarray studies. BMC Bioinformatics, 8, 326.
- Pepe, M. (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford Statistical Science Series. Oxford...
- Picard, R.R. and Berk, K.N. (1990). Data splitting. The American Statistician, 44, 140–147.
- Quintana, J., Esteban, C., Unzurrunzaga, A., Garcia-Gutierrez, S., Gonzalez, N., Lafuente, I., Bare, M., de Larrea, N.F., Vidal, S., et al....
- Smith, G. C.S., Seaman, S.R., Wood, A.M., Royston, P. and White, I.R. (2014). Correcting for optimistic prediction in small data sets. American...
- Snee, R.D. (1977). Validation of regression models: methods and examples. Technometrics, 19, 415–428.
- Steyerberg, E. (2009). Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. Springer Science & Business...
- Steyerberg, E.W., Bleeker, S.E., Moll, H.A., Grobbee, D.E. and Moons, K.G. (2003). Internal and external validation of predictive models:...
- Steyerberg, E.W., Harrell, F.E., Borsboom, G.J., Eijkemans, M., Vergouwe, Y. and Habbema, J.F. (2001). Internal validation of predictive models....
- Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society. Series B (Methodological),...
- Swets, J.A. (1988). Measuring the accuracy of diagnostic systems. Science, 240, 1285–1293.
- van Smeden, M., Moons, K.G., de Groot, J.A., Collins, G.S., Altman, D.G., Eijkemans, M.J. and Reitsma, J.B. (2018). Sample size for binary...
- Wada, T., Yasunaga, H., Yamana, H., Matsui, H., Fushimi, K. and Morimura, N. (2017). Development and validation of an ICD-10-based disability...
- Wishart, G., Bajdik, C., Dicks, E., Provenzano, E., Schmidt, M., Sherman, M., Greenberg, D., Green, A., Gelmon, K., Kosma, V., et al. (2012)....