Best subset selection via cross-validation criterion

Yuichi Takano; Ryuhei Miyashiro

Ayuda

Best subset selection via cross-validation criterion

Yuichi Takano ^[1] ; Ryuhei Miyashiro ^[2]
1. [1] University of Tsukuba
  
  University of Tsukuba
  
  Japón
2. [2] Tokyo University of Agriculture and Technology
  
  Tokyo University of Agriculture and Technology
  
  Japón
Localización: Top, ISSN-e 1863-8279, ISSN 1134-5764, Vol. 28, Nº. 2, 2020, págs. 475-488
Idioma: inglés
DOI: 10.1007/s11750-020-00538-1
Enlaces
- Texto completo
Resumen
- This paper is concerned with the cross-validation criterion for selecting the best subset of explanatory variables in a linear regression model. In contrast with the use of statistical criteria (e.g., Mallows’ Cp, the Akaike information criterion, and the Bayesian information criterion), cross-validation requires only mild assumptions, namely, that samples are identically distributed and that training and validation samples are independent. For this reason, the cross-validation criterion is expected to work well in most situations involving predictive methods. The purpose of this paper is to establish a mixed-integer optimization approach to selecting the best subset of explanatory variables via the cross-validation criterion. This subset-selection problem can be formulated as a bilevel MIO problem. We then reduce it to a single-level mixed-integer quadratic optimization problem, which can be solved exactly by using optimization software. The efficacy of our method is evaluated through simulation experiments by comparison with statistical-criterion-based exhaustive search algorithms and L1-regularized regression. Our simulation results demonstrate that, when the signal-to-noise ratio was low, our method delivered good accuracy for both subset selection and prediction.