Optimism correction of the area under the ROC curve, with missing data

Susana Rafaela Martins ^[2] ; María del Carmen Iglesias-Pérez ^[1] ; Jacobo de Uña-Álvarez ^[1]
1. [1] Universidade de Vigo
  
  Universidade de Vigo
  
  Vigo, España
2. [2] Escola Superior de Desporto e Lazer, Instituto Polit´ecnico de Viana do Castelo, Portugal.
Localización: Sort: Statistics and Operations Research Transactions, ISSN 1696-2281, Vol. 49, Nº. 2, 2025, págs. 179-212
Idioma: inglés
Enlaces
- Texto completo (pdf)
Resumen
- The area under the ROC curve (AUC) plays an important role in the study of the predictive capacity of regression models. It is well known that an inflated AUC may result when the same data are used for training and testing the model. In this paper optimism correction of the AUC in the presence of missing data is investigated. Complete case analysis, inverse probability weighting and multiple imputation are employed to address the issue of missing data. For each of these approaches, split-sample, K-fold cross-validation and leave-one-out cross-validation are employed to correct for the optimism of the AUC. The methods are compared through intensive Monte Carlo simulations in the particular setting of binary regression. Results suggest that all estimators are consistent with the exception of complete case analysis, which may be biased when missing is not completely at random. In general, a combined application of multiple imputation and leave-one-out cross-validation is recommended.