Ir al contenido

Documat


Best subset selection via cross-validation criterion

  • Yuichi Takano [1] ; Ryuhei Miyashiro [2]
    1. [1] University of Tsukuba

      University of Tsukuba

      Japón

    2. [2] Tokyo University of Agriculture and Technology

      Tokyo University of Agriculture and Technology

      Japón

  • Localización: Top, ISSN-e 1863-8279, ISSN 1134-5764, Vol. 28, Nº. 2, 2020, págs. 475-488
  • Idioma: inglés
  • DOI: 10.1007/s11750-020-00538-1
  • Enlaces
  • Resumen
    • This paper is concerned with the cross-validation criterion for selecting the best subset of explanatory variables in a linear regression model. In contrast with the use of statistical criteria (e.g., Mallows’ Cp, the Akaike information criterion, and the Bayesian information criterion), cross-validation requires only mild assumptions, namely, that samples are identically distributed and that training and validation samples are independent. For this reason, the cross-validation criterion is expected to work well in most situations involving predictive methods. The purpose of this paper is to establish a mixed-integer optimization approach to selecting the best subset of explanatory variables via the cross-validation criterion. This subset-selection problem can be formulated as a bilevel MIO problem. We then reduce it to a single-level mixed-integer quadratic optimization problem, which can be solved exactly by using optimization software. The efficacy of our method is evaluated through simulation experiments by comparison with statistical-criterion-based exhaustive search algorithms and L1-regularized regression. Our simulation results demonstrate that, when the signal-to-noise ratio was low, our method delivered good accuracy for both subset selection and prediction.


Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno