Ir al contenido

Documat


Resumen de Feature Ranking for Feature Sorting and Feature Selection: FR4(FS)2

Paula Santana, Alberto Merchán, Alba Márquez Rodríguez, Antonio Javier Tallón Ballesteros

  • This paper proposes a methodology to feature sorting as well as feature selection in the context of supervised machine learning algorithms. Feature sorting has been revealed as a step which may play a paramount role in machine learning. Nonetheless, the scalability is an important drawback. This paper proposes to add a further stage in order to only retain attributes with a positive influence (att+) and limiting them in a predefined percentage of att+ set. This contribution aims at introducing a new methodology where all attributes are not included in the data mining task but also the positive influence ones till a certain limit. We have followed two different types of sorting by means of different feature ranking methods. The approach has been assessed in three binary problems with a number of features between 1000 and 10000, and a number of instances from 200 to 7000; the test-bed includes challenging data sets from NIPS 2003. According to the experimental results for InfoGain and GainRatio the 90% of the attributes with positive influence are enough to get results in most of the cases comparable to the results with raw data taking into account that the required time to train the classifiers is shorter and hence in the non-required time we may be able to process more instances.


Fundación Dialnet

Mi Documat