Resumen de Funciones de similitud para datos estructurados: una aplicación a los árboles de decisión.

Learning from structured data is becoming increasingly important. Besides the well-known approaches which deal directly with complex data representations (inductive logic programming and multi-relational data mining), new techniques have been recently proposed by upgrading propositional learning algorithms. Focusing on distance-based methods, these techniques are extended by incorporating similarity functions defined over structured domains, for instance a k-NN algorithm solving a graph classification problem. Since a measure between objects is the essential component for this kind of methods, this paper starts with a description of some of the recent similarity functions defined over common structured data (lists, sets, terms, etc.). However, many of the most common classification techniques, such as decision tree learning, are not distance-based methods or cannot be directly adapted to be so (as kernel methods and neural networks have been adapted). In this work, we extend decision trees to use any kind of similarity function. The method is inspired by centre splitting, which constructs decision trees by defining splits based on the distance to two or more centroids. We include an experimental analysis with both propositional data and complex data. Apart from the advantages of the new proposed method, it can be used as an example of how other partition-based methods can be adapted to deal with distances and, hence, with structured data.

Acceso de usuarios registrados

¿Es nuevo? Regístrese

Coordinado por: