Ir al contenido

Documat


Resumen de Aprendizaje de similitudes entre pares de objetos mediante clasificación supervisada

Emilia López-Iñesta Árbol académico

  • The use of measures of similarity, distances or metrics is a core central issue for many standard classification techniques, becoming a fundamental and important task in the areas of study of Machine Learning and Pattern Recognition. Since computing the similarity between two objects may be very different depending on the context, the intelligent construction of these measures from the available data can help in obtaining more robust classifiers and improve the results in the specific task that It is proposed to resolve.

    In recent years, Metric Learning and Similarity Learning techniques have received a growing interest from the scientific community. Given the available information in the form of labeled examples with a category or class, the main goal of Metric Learning is to learn a metric distance according to the following principle: the distances between similar pairs (i.e., pairs of objects with the same class) must be small, while the distances between different pairs (i.e., different classes) must be greater. Likewise, Similarity Learning attempts to learn a similarity function that associates large scores with similar pairs and small scores to different pairs. A particular case of Similarity Learning is the use of classification methods for learning similarity measures known as Classification-based Similarity Learning. In all these methods, the performance depends to a great extent on the features representation of the available data.

    Thus, this Thesis presents an enriched classification method that follows a hybrid approach combining Feature Extraction and Feature Expansion techniques. In particular, we propose a data transformation and the use of a set of metric and non-metric distances to complement the information provided by the feature vectors of the training examples. While this increases the dimensionality of the problem in question, it also implies an additional injection of knowledge because the use of distance measures implies an implicit match between the characteristics of two objects. In addition, we analyze whether the new information added compensates for the dimensionality increasement involved, as well as the influence of different data input formats and training size on classifier performance.

    The proposal is compared with metric learning methods and the results obtained show comparable yields in favor of the proposed method in different contexts and using different databases.


Fundación Dialnet

Mi Documat