Best subset selection via cross-validation criterion

Takano, Yuichi; Miyashiro, Ryuhei

doi:10.1007/s11750-020-00538-1

Best subset selection via cross-validation criterion

Original Paper
Published: 14 February 2020

Volume 28, pages 475–488, (2020)
Cite this article

TOP Aims and scope Submit manuscript

Yuichi Takano¹ &
Ryuhei Miyashiro²

1141 Accesses
21 Citations
Explore all metrics

Abstract

This paper is concerned with the cross-validation criterion for selecting the best subset of explanatory variables in a linear regression model. In contrast with the use of statistical criteria (e.g., Mallows’ \(C_p\), the Akaike information criterion, and the Bayesian information criterion), cross-validation requires only mild assumptions, namely, that samples are identically distributed and that training and validation samples are independent. For this reason, the cross-validation criterion is expected to work well in most situations involving predictive methods. The purpose of this paper is to establish a mixed-integer optimization approach to selecting the best subset of explanatory variables via the cross-validation criterion. This subset-selection problem can be formulated as a bilevel MIO problem. We then reduce it to a single-level mixed-integer quadratic optimization problem, which can be solved exactly by using optimization software. The efficacy of our method is evaluated through simulation experiments by comparison with statistical-criterion-based exhaustive search algorithms and \(L_1\)-regularized regression. Our simulation results demonstrate that, when the signal-to-noise ratio was low, our method delivered good accuracy for both subset selection and prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparison of Bayesian predictive methods for model selection

Article Open access 07 April 2016

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Article 30 August 2016

Statistical Fitting Criterion on the Basis of Cross-Validation Estimation

Article 01 July 2018

References

Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6):716–723
Article Google Scholar
Allen DM (1974) The relationship between variable selection and data augmentation and a method for prediction. Technometrics 16(1):125–127
Article Google Scholar
Arthanari TS, Dodge Y (1981) Mathematical programming in statistics. Wiley, New York
Google Scholar
Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79
Article Google Scholar
Benati S, García S (2014) A mixed integer linear model for clustering with variable selection. Comput Oper Res 43:280–285
Article Google Scholar
Bennett KP, Hu J, Ji X, Kunapuli G, Pang JS (2006) Model selection via bilevel optimization. In: Proceedings of the 2006 IEEE international joint conference on neural networks, pp 1922–1929
Bertsimas D, King A (2016) OR forum—an algorithmic approach to linear regression. Oper Res 64(1):2–16
Article Google Scholar
Bertsimas D, King A, Mazumder R (2016) Best subset selection via a modern optimization lens. Ann Stat 44(2):813–852
Article Google Scholar
Bertsimas D, Dunn J (2017) Optimal classification trees. Mach Learn 106(7):1039–1082
Article Google Scholar
Bertsimas D, King A (2017) Logistic regression: from art to science. Stat Sci 32(3):367–384
Article Google Scholar
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
Book Google Scholar
Chung S, Park YW, Cheong T (2017) A mathematical programming approach for integrated multiple linear regression subset selection and validation. arXiv preprint arXiv:1712.04543
Colson B, Marcotte P, Savard G (2007) An overview of bilevel optimization. Ann Oper Res 153(1):235–256
Article Google Scholar
Cozad A, Sahinidis NV, Miller DC (2014) Learning surrogate models for simulation-based optimization. AIChE J 60(6):2211–2227
Article Google Scholar
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22
Article Google Scholar
Geisser S (1975) The predictive sample reuse method with applications. J Am Stat Assoc 70(350):320–328
Article Google Scholar
Hastie T, Tibshirani R, Tibshirani RJ (2017) Extended comparisons of best subset selection, forward stepwise selection, and the lasso. arXiv preprint arXiv:1707.08692
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67
Article Google Scholar
Hooker JN, Osorio MA (1999) Mixed logical-linear programming. Discrete Appl Math 96–97:395–442
Article Google Scholar
Kimura K, Waki H (2018) Minimization of Akaike’s information criterion in linear regression analysis via mixed integer nonlinear program. Optim Methods Softw 33(3):633–649
Article Google Scholar
Konno H, Yamamoto R (2009) Choosing the best set of variables in regression analysis using integer programming. J Glob Optim 44(2):273–282
Article Google Scholar
Kunapuli G, Bennett KP, Hu J, Pang JS (2008) Classification model selection via bilevel programming. Optim Methods Softw 23(4):475–489
Article Google Scholar
Maldonado S, Pérez J, Weber R, Labbé M (2014) Feature selection for support vector machines via mixed integer linear programming. Inf Sci 279:163–175
Article Google Scholar
Mallows CL (1973) Some comments on \(C_p\). Technometrics 15(4):661–675
Google Scholar
Miller A (2002) Subset selection in regression. Chapman and Hall, Boca Raton
Book Google Scholar
Miyashiro R, Takano Y (2015a) Subset selection by Mallows’ \(C_p\): a mixed integer programming approach. Expert Syst Appl 42(1):325–331
Article Google Scholar
Miyashiro R, Takano Y (2015b) Mixed integer second-order cone programming formulations for variable selection in linear regression. Eur J Oper Res 247(3):721–731
Article Google Scholar
Mosier CI (1951) I. Problems and designs of cross-validation. Educ Psychol Meas 11(1):5–11
Article Google Scholar
Naganuma M, Takano Y, Miyashiro R (2019) Feature subset selection for ordered logit model via tangent-plane-based approximation. IEICE Tran Inf Syst E102-D(5), 1046–1053
Okuno T, Takeda A, Kawana A (2018) Hyperparameter learning for bilevel nonsmooth optimization. arXiv preprint arXiv:1806.01520
Park YW, Klabjan D (2017) Subset selection for multiple linear regression via optimization. arXiv preprint arXiv:1701.07920
Pedregosa F (2016) Hyperparameter optimization with approximate gradient. In: Proceedings of the 33rd international conference on machine learning, pp 737–746
Sato T, Takano Y, Miyashiro R, Yoshise A (2016) Feature subset selection for logistic regression via mixed integer optimization. Comput Optim Appl 64(3):865–880
Article Google Scholar
Sato T, Takano Y, Miyashiro R (2017) Piecewise-linear approximation for feature subset selection in a sequential logit model. J Oper Res Soc Jpn 60(1):1–14
Article Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Article Google Scholar
Shao J (1993) Linear model selection by cross-validation. J Am Stat Assoc 88(422):486–494
Article Google Scholar
Sinha A, Malo P, Deb K (2018) A review on bilevel optimization: from classical to evolutionary approaches and applications. IEEE Trans Evolut Comput 22(2):276–295
Article Google Scholar
Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc Ser B Methodol 36(2):111–147
Google Scholar
Tamura R, Kobayashi K, Takano Y, Miyashiro R, Nakata K, Matsui T (2017) Best subset selection for eliminating multicollinearity. J Oper Res Soc Jpn 60(3):321–336
Article Google Scholar
Tamura R, Kobayashi K, Takano Y, Miyashiro R, Nakata K, Matsui T (2019) Mixed integer quadratic optimization formulations for eliminating multicollinearity based on variance inflation factor. J Glob Optim 73(2):431–446
Article Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol 58:267–288
Google Scholar
Ustun B, Rudin C (2016) Supersparse linear integer models for optimized medical scoring systems. Mach Learn 102(3):349–391
Article Google Scholar
van Rijsbergen CJ (1979) Information retrieval, 2nd edn. Butterworth-Heinemann, Oxford
Google Scholar
Wherry R (1931) A new formula for predicting the shrinkage of the coefficient of multiple correlation. Ann Math Stat 2(4):440–457
Article Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67(2):301–320
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank two anonymous reviewers for their helpful comments. This work was partially supported by JSPS KAKENHI Grant Numbers JP17K01246 and JP17K12983.

Author information

Authors and Affiliations

Faculty of Engineering, Information and Systems, University of Tsukuba, 1-1-1 Tennodai, Tsukuba-shi, Ibaraki, 305-8577, Japan
Yuichi Takano
Institute of Engineering, Tokyo University of Agriculture and Technology, 2-24-16 Naka-cho, Koganei-shi, Tokyo, 184-8588, Japan
Ryuhei Miyashiro

Authors

Yuichi Takano
View author publications
You can also search for this author in PubMed Google Scholar
Ryuhei Miyashiro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuichi Takano.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Takano, Y., Miyashiro, R. Best subset selection via cross-validation criterion. TOP 28, 475–488 (2020). https://doi.org/10.1007/s11750-020-00538-1

Download citation

Received: 13 January 2019
Accepted: 18 January 2020
Published: 14 February 2020
Issue Date: July 2020
DOI: https://doi.org/10.1007/s11750-020-00538-1

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Best subset selection via cross-validation criterion

Abstract

Access this article

Similar content being viewed by others

Comparison of Bayesian predictive methods for model selection

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Statistical Fitting Criterion on the Basis of Cross-Validation Estimation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Best subset selection via cross-validation criterion

Abstract

Access this article

Similar content being viewed by others

Comparison of Bayesian predictive methods for model selection

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Statistical Fitting Criterion on the Basis of Cross-Validation Estimation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation