Ir al contenido

Documat


Geometric optimization for classification problems

  • Autores: Pablo Perez Lantero
  • Directores de la Tesis: José Miguel Díaz-Báñez (dir. tes.) Árbol académico
  • Lectura: En la Universidad de Sevilla ( España ) en 2010
  • Idioma: inglés
  • Títulos paralelos:
    • Optimización geométrica para problemas de clasificación
  • Tribunal Calificador de la Tesis: Ferran Hurtado Díaz (presid.) Árbol académico, Inmaculada Ventura Molina (secret.) Árbol académico, Jorge Urrutia Galicia (voc.) Árbol académico, Manuel Abellanas Oar (voc.) Árbol académico, Sergey Bereg (voc.) Árbol académico
  • Enlaces
    • Tesis en acceso abierto en: Idus
  • Resumen
    • Data Mining is a relevant discipline in Computer Science, the main goal of which is to explore data and extract information that is potentially useful and previously unknown, By using mathematical tools, such as Operation Research, Statistics, Artificial Intelligence and more recently Computational Geometry, Data Mining solves problems in many areas where there are big databases. Within Computational Geometry, the techniques of Geometric Optimization can be applied to solve many problems in this field. Typically, problems in Data Mining concern data belonging to two classes, say red and blue, and mainly appear in important subareas such as the classification of new data and the recognition of patterns.

      This thesis focuses on the study of optimization problems with application in data classification and pattern recognition. In all of them, we are given a two-class data set represented as red and blue points in the plane, and the objective is to find simple geometric shapes meeting some requirements for classification. The problems are approached from the Computational Geometry point of view, and efficient algorithms that use the inherent geometry of the problems are proposed.

      A crucial problem in Data Mining is the so-called "Maximum Box Problem", where the geometric shape to be found is a maximum box, that is, an axis-aligned rectangle containing the maximum number of elements of only one class in the given data set. This thesis solves some natural variants of this basic problem by considering: two boxes (one per class), the minimum number of boxes to cover a class, or the maximum box in kinetic scenarios. Commonly, classification methods suppose a "good" data distribution, so a clustering procedure can be applied. However, if the classes are "well mixed", a clustering for selecting prototypes that represent a class is not possible. In that sense, this thesis studies a new parameter to measure, a priori, if a given two-class data set is suitable or not for classification.


Fundación Dialnet

Mi Documat

Opciones de tesis

Opciones de compartir

Opciones de entorno