Ir al contenido

Documat


Resumen de Geometric optimization for classification problems

Pablo Perez Lantero

  • Data Mining is a relevant discipline in Computer Science, the main goal of which is to explore data and extract information that is potentially useful and previously unknown, By using mathematical tools, such as Operation Research, Statistics, Artificial Intelligence and more recently Computational Geometry, Data Mining solves problems in many areas where there are big databases. Within Computational Geometry, the techniques of Geometric Optimization can be applied to solve many problems in this field. Typically, problems in Data Mining concern data belonging to two classes, say red and blue, and mainly appear in important subareas such as the classification of new data and the recognition of patterns.

    This thesis focuses on the study of optimization problems with application in data classification and pattern recognition. In all of them, we are given a two-class data set represented as red and blue points in the plane, and the objective is to find simple geometric shapes meeting some requirements for classification. The problems are approached from the Computational Geometry point of view, and efficient algorithms that use the inherent geometry of the problems are proposed.

    A crucial problem in Data Mining is the so-called "Maximum Box Problem", where the geometric shape to be found is a maximum box, that is, an axis-aligned rectangle containing the maximum number of elements of only one class in the given data set. This thesis solves some natural variants of this basic problem by considering: two boxes (one per class), the minimum number of boxes to cover a class, or the maximum box in kinetic scenarios. Commonly, classification methods suppose a "good" data distribution, so a clustering procedure can be applied. However, if the classes are "well mixed", a clustering for selecting prototypes that represent a class is not possible. In that sense, this thesis studies a new parameter to measure, a priori, if a given two-class data set is suitable or not for classification.


Fundación Dialnet

Mi Documat