Ir al contenido

Documat


On generalized Gower distance for mixed-type data: extensive simulation study and new software tools

  • Aurea Grané [1] ; Fabio Scielzo-Ortiz [1]
    1. [1] Universidad Carlos III de Madrid

      Universidad Carlos III de Madrid

      Madrid, España

  • Localización: Sort: Statistics and Operations Research Transactions, ISSN 1696-2281, Vol. 49, Nº. 2, 2025, págs. 213-244
  • Idioma: inglés
  • Enlaces
  • Resumen
    • Data scientists address real-world problems using multivariate and heterogeneous data-sets, characterized by multiple variables of different natures. Selecting a suitable distance function between units is crucial, as many statistical techniques and machine learning algorithms depend on this concept. Traditional distances, such as Euclidean or Manhattan, are unsuitable for mixed-type data, and although Gower distance was designed to handle this kind of data, it may lead to suboptimal results in the presence of outlying units or underlying correlation structure. In this work robust distances for mixed-type data are defined and explored, namely robust generalized Gower and robust related metric scaling. A new Python package is developed, which enables to compute these robust proposals as well as classical ones.


Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno