Ir al contenido

Documat


Dataset Weighting via Intrinsic Data Characteristics for Pairwise Statistical Comparisons in Classification

  • Sáez, José A. [1] ; Pablo Villacorta [2] ; Emilio Corchado [1]
    1. [1] Universidad de Salamanca

      Universidad de Salamanca

      Salamanca, España

    2. [2] Universidad de Granada

      Universidad de Granada

      Granada, España

  • Localización: Hybrid Artificial Intelligent Systems. 14th International Conference, HAIS 2019: León, Spain, September 4–6, 2019. Proceedings / coord. por Hilde Pérez García Árbol académico, Lidia Sánchez González Árbol académico, Manuel Castejón Limas Árbol académico, Héctor Quintián Pardo Árbol académico, Emilio Santiago Corchado Rodríguez Árbol académico, 2019, ISBN 978-3-030-29858-6, págs. 61-72
  • Idioma: inglés
  • Enlaces
  • Resumen
    • In supervised learning, some data characteristics (e.g. presence of errors, overlapping degree, etc.) may negatively influence classifier performance. Many methods are designed to overcome the undesirable effects of the aforementioned issues. When comparing one of those techniques with existing ones, a proper selection of datasets must be made, based on how well each dataset reflects the characteristic being specifically addressed by the proposed algorithm. In this setting, statistical tests are necessary to check the significance of the differences found in the comparison of different methods. Wilcoxon’s signed-ranks test is one of the most well-known statistical tests for pairwise comparisons between classifiers. However, it gives the same importance to every dataset, disregarding how representative each of them is in relation to the concrete issue addressed by the methods compared. This research proposes a hybrid approach which combines techniques of measurement for data characterization with statistical tests for decision making in data mining. Thus, each dataset is weighted according to its representativeness of the property of interest before using Wilcoxon’s test. Our proposal has been successfully compared with the standard Wilcoxon’s test in two scenarios related to the noisy data problem. As a result, this approach stands out properties of the algorithms easier, which may otherwise remain hidden if data characteristics are not considered in the comparison.


Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno