This thesis dissertation focus on developing new approaches for different Data Science problems from a Location Theory perspective. In particular, we concentrate on locating hyperplanes by means of solving Mixed Integer Linear and Non Linear Problems. Chapter 1 introduces the baseline techniques involved in this work, which encompass Support Vector Machines, Decision Trees and Fitting Hyperplanes Theory. In Chapter 2 we study the problem of locating a set of hyperplanes for multiclass classification problems, extending the binary Support Vector Machines paradigm. We present four Mathematical Programming formulations which allow us to vary the error measures involved in the problems as well as the norms used to measure distances. We report an extensive battery of computational experiment over real and synthetic datasets which reveal the powerfulness of our approach. Moreover, we prove that the kernel trick can be applicable in our method. Chapter 3 also focus on locating a set of hyperplanes, in this case, aiming to minimize an objective function of the closest distances from a set of points. The problem is treated in a general framework in which norm-based distances between points and hyperplanes are aggregated by means of ordered median functions. We present a compact formulation and also a set partitioning one. A column generation procedure is developed in order to solve the set partitioning problem. We report the results of an extensive computational experience, as well as theoretical results over the scalability issues and geometrical analysis of the optimal solutions. Chapter 4 addresses the problem of finding a separating hyperplane for binary classification problems in which label noise is considered to occur over the training sample. We derive three methodologies, two of them based on clustering techniques, which incorporate the ability of relabeling observations, i.e., treating them as if they belong to their contrary class, during the training process. We report computational experiments that show how our methodologies obtain higher accuracies when training samples contain label noise. Chapters 5 and 6 consider the problem of locating a set of hyperplanes, following the Support Vector Machines classification principles, in the context of Classification Trees. The methodologies developed in both chapters inherit properties from Chapter 4, which play an important role in the problems formulations. On the one hand, Chapter 5 focuses on binary classification problems where label noise can occur in training samples. On the other hand, Chapter 6 focus on solving the multiclass classification problem. Both chapters present the results of our computational experiments which show how the methodologies derived outperform other Classification Trees methodologies. Finally, Chapter 7 presents the conclusions of this thesis.
© 2008-2024 Fundación Dialnet · Todos los derechos reservados