Skip to main content
Log in

Best subset selection via cross-validation criterion

  • Original Paper
  • Published:
TOP Aims and scope Submit manuscript

Abstract

This paper is concerned with the cross-validation criterion for selecting the best subset of explanatory variables in a linear regression model. In contrast with the use of statistical criteria (e.g., Mallows’ \(C_p\), the Akaike information criterion, and the Bayesian information criterion), cross-validation requires only mild assumptions, namely, that samples are identically distributed and that training and validation samples are independent. For this reason, the cross-validation criterion is expected to work well in most situations involving predictive methods. The purpose of this paper is to establish a mixed-integer optimization approach to selecting the best subset of explanatory variables via the cross-validation criterion. This subset-selection problem can be formulated as a bilevel MIO problem. We then reduce it to a single-level mixed-integer quadratic optimization problem, which can be solved exactly by using optimization software. The efficacy of our method is evaluated through simulation experiments by comparison with statistical-criterion-based exhaustive search algorithms and \(L_1\)-regularized regression. Our simulation results demonstrate that, when the signal-to-noise ratio was low, our method delivered good accuracy for both subset selection and prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6):716–723

    Article  Google Scholar 

  • Allen DM (1974) The relationship between variable selection and data augmentation and a method for prediction. Technometrics 16(1):125–127

    Article  Google Scholar 

  • Arthanari TS, Dodge Y (1981) Mathematical programming in statistics. Wiley, New York

    Google Scholar 

  • Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79

    Article  Google Scholar 

  • Benati S, García S (2014) A mixed integer linear model for clustering with variable selection. Comput Oper Res 43:280–285

    Article  Google Scholar 

  • Bennett KP, Hu J, Ji X, Kunapuli G, Pang JS (2006) Model selection via bilevel optimization. In: Proceedings of the 2006 IEEE international joint conference on neural networks, pp 1922–1929

  • Bertsimas D, King A (2016) OR forum—an algorithmic approach to linear regression. Oper Res 64(1):2–16

    Article  Google Scholar 

  • Bertsimas D, King A, Mazumder R (2016) Best subset selection via a modern optimization lens. Ann Stat 44(2):813–852

    Article  Google Scholar 

  • Bertsimas D, Dunn J (2017) Optimal classification trees. Mach Learn 106(7):1039–1082

    Article  Google Scholar 

  • Bertsimas D, King A (2017) Logistic regression: from art to science. Stat Sci 32(3):367–384

    Article  Google Scholar 

  • Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Chung S, Park YW, Cheong T (2017) A mathematical programming approach for integrated multiple linear regression subset selection and validation. arXiv preprint arXiv:1712.04543

  • Colson B, Marcotte P, Savard G (2007) An overview of bilevel optimization. Ann Oper Res 153(1):235–256

    Article  Google Scholar 

  • Cozad A, Sahinidis NV, Miller DC (2014) Learning surrogate models for simulation-based optimization. AIChE J 60(6):2211–2227

    Article  Google Scholar 

  • Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22

    Article  Google Scholar 

  • Geisser S (1975) The predictive sample reuse method with applications. J Am Stat Assoc 70(350):320–328

    Article  Google Scholar 

  • Hastie T, Tibshirani R, Tibshirani RJ (2017) Extended comparisons of best subset selection, forward stepwise selection, and the lasso. arXiv preprint arXiv:1707.08692

  • Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67

    Article  Google Scholar 

  • Hooker JN, Osorio MA (1999) Mixed logical-linear programming. Discrete Appl Math 96–97:395–442

    Article  Google Scholar 

  • Kimura K, Waki H (2018) Minimization of Akaike’s information criterion in linear regression analysis via mixed integer nonlinear program. Optim Methods Softw 33(3):633–649

    Article  Google Scholar 

  • Konno H, Yamamoto R (2009) Choosing the best set of variables in regression analysis using integer programming. J Glob Optim 44(2):273–282

    Article  Google Scholar 

  • Kunapuli G, Bennett KP, Hu J, Pang JS (2008) Classification model selection via bilevel programming. Optim Methods Softw 23(4):475–489

    Article  Google Scholar 

  • Maldonado S, Pérez J, Weber R, Labbé M (2014) Feature selection for support vector machines via mixed integer linear programming. Inf Sci 279:163–175

    Article  Google Scholar 

  • Mallows CL (1973) Some comments on \(C_p\). Technometrics 15(4):661–675

    Google Scholar 

  • Miller A (2002) Subset selection in regression. Chapman and Hall, Boca Raton

    Book  Google Scholar 

  • Miyashiro R, Takano Y (2015a) Subset selection by Mallows’ \(C_p\): a mixed integer programming approach. Expert Syst Appl 42(1):325–331

    Article  Google Scholar 

  • Miyashiro R, Takano Y (2015b) Mixed integer second-order cone programming formulations for variable selection in linear regression. Eur J Oper Res 247(3):721–731

    Article  Google Scholar 

  • Mosier CI (1951) I. Problems and designs of cross-validation. Educ Psychol Meas 11(1):5–11

    Article  Google Scholar 

  • Naganuma M, Takano Y, Miyashiro R (2019) Feature subset selection for ordered logit model via tangent-plane-based approximation. IEICE Tran Inf Syst E102-D(5), 1046–1053

  • Okuno T, Takeda A, Kawana A (2018) Hyperparameter learning for bilevel nonsmooth optimization. arXiv preprint arXiv:1806.01520

  • Park YW, Klabjan D (2017) Subset selection for multiple linear regression via optimization. arXiv preprint arXiv:1701.07920

  • Pedregosa F (2016) Hyperparameter optimization with approximate gradient. In: Proceedings of the 33rd international conference on machine learning, pp 737–746

  • Sato T, Takano Y, Miyashiro R, Yoshise A (2016) Feature subset selection for logistic regression via mixed integer optimization. Comput Optim Appl 64(3):865–880

    Article  Google Scholar 

  • Sato T, Takano Y, Miyashiro R (2017) Piecewise-linear approximation for feature subset selection in a sequential logit model. J Oper Res Soc Jpn 60(1):1–14

    Article  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464

    Article  Google Scholar 

  • Shao J (1993) Linear model selection by cross-validation. J Am Stat Assoc 88(422):486–494

    Article  Google Scholar 

  • Sinha A, Malo P, Deb K (2018) A review on bilevel optimization: from classical to evolutionary approaches and applications. IEEE Trans Evolut Comput 22(2):276–295

    Article  Google Scholar 

  • Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc Ser B Methodol 36(2):111–147

    Google Scholar 

  • Tamura R, Kobayashi K, Takano Y, Miyashiro R, Nakata K, Matsui T (2017) Best subset selection for eliminating multicollinearity. J Oper Res Soc Jpn 60(3):321–336

    Article  Google Scholar 

  • Tamura R, Kobayashi K, Takano Y, Miyashiro R, Nakata K, Matsui T (2019) Mixed integer quadratic optimization formulations for eliminating multicollinearity based on variance inflation factor. J Glob Optim 73(2):431–446

    Article  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol 58:267–288

    Google Scholar 

  • Ustun B, Rudin C (2016) Supersparse linear integer models for optimized medical scoring systems. Mach Learn 102(3):349–391

    Article  Google Scholar 

  • van Rijsbergen CJ (1979) Information retrieval, 2nd edn. Butterworth-Heinemann, Oxford

    Google Scholar 

  • Wherry R (1931) A new formula for predicting the shrinkage of the coefficient of multiple correlation. Ann Math Stat 2(4):440–457

    Article  Google Scholar 

  • Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67(2):301–320

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank two anonymous reviewers for their helpful comments. This work was partially supported by JSPS KAKENHI Grant Numbers JP17K01246 and JP17K12983.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuichi Takano.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Takano, Y., Miyashiro, R. Best subset selection via cross-validation criterion. TOP 28, 475–488 (2020). https://doi.org/10.1007/s11750-020-00538-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11750-020-00538-1

Keywords

Mathematics Subject Classification

Navigation