Ir al contenido

Documat


Distribution-free tests for lossless feature selection in classification and regression

  • László Györfi [1] ; Tamás Linder [2] ; Harro Walk [3]
    1. [1] Budapest University of Technology and Economics

      Budapest University of Technology and Economics

      Hungría

    2. [2] Queen's University

      Queen's University

      Canadá

    3. [3] University of Stuttgart

      University of Stuttgart

      Stadtkreis Stuttgart, Alemania

  • Localización: Test: An Official Journal of the Spanish Society of Statistics and Operations Research, ISSN-e 1863-8260, ISSN 1133-0686, Vol. 34, Nº. 1, 2025, págs. 262-287
  • Idioma: inglés
  • Texto completo no disponible (Saber más ...)
  • Resumen
    • We study the problem of lossless feature selection for a d-dimensional feature vector and label Y for binary classification as well as nonparametric regression. Foran index set , consider the selected |S|-dimensional feature subvector . If and stand for the minimum risk based on X and , respectively, then is called lossless if . For classification, the minimum risk is the Bayes error probability, while in regression, the minimum risk is the residual variance. We introduce nearest-neighbor-based test statistics to test the hypothesis that is lossless. This test statistic is an estimate of the excess risk . Surprisingly, estimating this excess risk turns out to be a functional estimation problem that does not suffer from the curse of dimensionality in the sense that the convergence rate does not depend on the dimension d. For the threshold , the corresponding tests are proved to be consistent under conditions on the distribution of (X, Y) that are significantly milder than in previous work. Also, our threshold is universal (dimension independent), in contrast to earlier methods where for large d the threshold becomes too large to be useful in practice.


Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno