Loss‑optimal classifcation trees: a generalized framework and the logistic case

Tommaso Aldinucci; Matteo Lapucci

Ayuda

Loss‑optimal classifcation trees: a generalized framework and the logistic case

Tommaso Aldinucci ^[1] ; Matteo Lapucci ^[1]
1. [1] University of Florence
  
  University of Florence
  
  Firenze, Italia
Localización: Top, ISSN-e 1863-8279, ISSN 1134-5764, Vol. 32, Nº. 2, 2024, págs. 323-350
Idioma: inglés
DOI: 10.1007/s11750-024-00674-y
Enlaces
- Texto completo
Resumen
- Classifcation trees are one of the most common models in interpretable machine learning. Although such models are usually built with greedy strategies, in recent years, thanks to remarkable advances in mixed-integer programming (MIP) solvers, several exact formulations of the learning problem have been developed. In this paper, we argue that some of the most relevant ones among these training models can be encapsulated within a general framework, whose instances are shaped by the specifcation of loss functions and regularizers. Next, we introduce a novel realization of this framework: specifcally, we consider the logistic loss, handled in the MIP setting by a piece-wise linear approximation, and couple it with l1-regularization terms. The resulting optimal logistic classifcation tree model numerically proves to be able to induce trees with enhanced interpretability properties and competitive generalization capabilities, compared to the state-of-the-art MIP-based approaches.
Referencias bibliográficas
- Aghaei S, Gómez A, Vayanos P (2021) Strong optimal classifcation trees. arXiv preprint arXiv:2103. 15965
- Alès Z, Huré V, Lambert A (2024) New optimization models for optimal classifcation trees. Comput Oper Res 164:106515
- Bach F, Jenatton R, Mairal J, Obozinski G et al (2012) Optimization with sparsity-inducing penalties. Found Trends Machine Learn 4(1):1–106...
- Bennett KP (1992) Decision tree construction via linear programming. Technical report, University of Wisconsin-Madison Department of Computer...
- Bennett KP, Blue J (1998) A support vector machine approach to decision trees. In: 1998 IEEE International Joint Conference on neural networks...
- Bennett KP, Mangasarian OL (1992) Robust linear programming discrimination of two linearly inseparable sets. Optimiz Methods Softw 1(1):23–34...
- Bennett KP, Mangasarian OL (1994) Multicategory discrimination via linear programming. Optimiz Methods Softw 3(1–3):27–39
- Bertsimas D, Dunn J (2017) Optimal classifcation trees. Mach Learn 106(7):1039–1082
- Bixby RE (2012) A brief history of linear and mixed-integer programming computation. Doc Math 2012:107–121
- Blanquero R, Carrizosa E, Molero-Río C, Romero Morales D (2020) Sparsity in optimal randomized classifcation trees. Eur J Oper Res 284(1):255–272...
- Blanquero R, Carrizosa E, Molero-Río C, Romero Morales D (2021) Optimal randomized classifcation trees. Comput Oper Res 132:105281
- Blanquero R, Carrizosa E, Molero-Río C, Romero Morales D (2022) On sparse optimal regression trees. Eur J Oper Res 299(3):1045–1054
- Bohanec M, Bratko I (1994) Trading accuracy for simplicity in decision trees. Mach Learn 15:223–250
- Breiman L (2001) Random forests. Mach Learn 45(1):5–32
- Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classifcation and regression trees. Chapman & Hall/ CRC, Boca Raton
- Brodley CE, Utgof PE (1995) Multivariate decision trees. Mach Learn 19(1):45–77
- Carreira-Perpinán MA, Tavallali P (2018) Alternating optimization of decision trees, with application to learning sparse oblique trees. Adv...
- Carrizosa E, Molero-Río C, Romero Morales D (2021) Mathematical optimization in classifcation and regression trees. TOP 29(1):5–33
- Chan K-Y, Loh W-Y (2004) Lotus: an algorithm for building accurate and comprehensible logistic regression trees. J Comput Graph Stat 13(4):826–852...
- Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd Acm Sigkdd International Conference on knowledge...
- Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
- De Mántaras RL (1991) A distance-based attribute selection measure for decision tree induction. Mach Learn 6(1):81–92
- Dolan ED, Moré JJ (2002) Benchmarking optimization software with performance profles. Math Program 91:201–213
- D’Onofrio F, Grani G, Monaci M, Palagi L (2024) Margin optimal classifcation trees. Comput Oper Res 161:106441
- Dua D, Graf C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed May 2023
- Dunn JW (2018) Optimal trees for prediction and prescription. PhD thesis, Massachusetts Institute of Technology
- Figueiredo MA, Nowak RD, Wright SJ (2007) Gradient projection for sparse reconstruction: application to compressed sensing and other inverse...
- Florio AM, Martins P, Schifer M, Serra T, Vidal T (2023) Optimal decision diagrams for classifcation. In: Proceedings of the AAAI Conference...
- Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232
- Friedman JH et al (1977) A recursive partitioning decision rule for nonparametric classifcation. IEEE Trans Comput 26(4):404–408
- Günlük O, Kalagnanam J, Li M, Menickelly M, Scheinberg K (2021) Optimal decision trees for categorical data via integer programming. J Global...
- Gurobi Optimization LLC (2022) Gurobi optimizer reference manual. https://www.gurobi.com. Accessed May 2023
- Hastie T, Tibshirani R, Friedman JH, Friedman JH (2009) The elements of statistical learning: data mining, inference, and prediction, vol...
- Hu X, Rudin C, Seltzer M (2019) Optimal sparse decision trees. Adv Neural Inform Process Syst 32:7267–7275
- John GH (1995) Robust linear discriminant trees. In: Pre-proceedings of the Fifth International Workshop on artifcial intelligence and statistics,...
- Jovanovic M, Radovanovic S, Vukicevic M, Van Poucke S, Delibasic B (2016) Building interpretable predictive models for pediatric hospital...
- Kamiya S, Miyashiro R, Takano Y (2019) Feature subset selection for the multinomial logit model via mixed-integer optimization. In: The 22nd...
- Kotsiantis SB (2013) Decision trees: a recent overview. Artif Intell Rev 39(4):261–283
- Landwehr N, Hall M, Frank E (2005) Logistic model trees. Mach Learn 59:161–205
- Laurent H, Rivest RL (1976) Constructing optimal binary decision trees is NP-complete. Inf Process Lett 5(1):15–17
- Lin J, Zhong C, Hu D, Rudin C, Seltzer M (2020) Generalized and scalable optimal sparse decision trees. In: International Conference on machine...
- Liu E, Hu T, Allen TT, Hermes C (2023) Optimal classifcation trees with leaf-branch and binary constraints applied to pipeline inspection....
- Loh W-Y, Vanichsetakul N (1988) Tree-structured classifcation via generalized discriminant analysis. J Am Stat Assoc 83(403):715–725
- Murthy SK, Salzberg S (1995) Decision tree induction: How efective is the greedy heuristic? In: KDD, pp 222–227
- Murthy SK, Kasif S, Salzberg S (1994) A system for induction of oblique decision trees. J Artif Intell Res 2:1–32
- Norouzi M, Collins M, Johnson MA, Fleet DJ, Kohli P (2015) Efcient non-greedy optimization of decision trees. Adv Neural Inform Process Syst...
- Orsenigo C, Vercellis C (2003) Multivariate classifcation trees based on minimum features discrete support vector machines. IMA J Manag Math...
- Patel KK, Desaulniers G, Lodi A (2024) An improved column-generation-based matheuristic for learning classifcation trees. Comput Oper Res,...
- Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A,...
- Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106
- Quinlan JR (1987) Simplifying decision trees. Int J Man Mach Stud 27(3):221–234
- Ribeiro MT, Singh S, Guestrin C (2016) "Why should I trust you?" Explaining the predictions of any classifer. In: Proceedings of the...
- Rokach L, Maimon O (2005) Top-down induction of decision trees classifers-a survey. IEEE Trans Syst Man Cybern Part C (Applications and Reviews)...
- Ross A, Lage I, Doshi-Velez F (2017) The neural lasso: local linear sparsity for interpretable explanations. In: Workshop on Transparent and...
- Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach...
- Sato T, Takano Y, Miyashiro R, Yoshise A (2016) Feature subset selection for logistic regression via mixed integer optimization. Comput Optim...
- Song Y-Y, Ying L (2015) Decision tree methods: applications for classifcation and prediction. Shanghai Arch Psychiatry 27(2):130
- Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc Ser B (Methodol) 58(1):267–288
- Tibshirani R, Hastie T (2007) Margin trees for high-dimensional classifcation. J Mach Learn Res 8(3):637–652
- Verwer S, Zhang Y (2019) Learning optimal classifcation trees using a binary linear program formulation. In: Proceedings of the AAAI Conference...