Efecto de la selección de rasgos en la clasificación basada en prototipos

Yumilka B. Fernández Hernández; Rafael Bello Pérez; Yaima Filiberto; Mabel Frías Dominguez; Yailé Caballero Mota

Ayuda

Efecto de la selección de rasgos en la clasificación basada en prototipos

Yumilka Bárbara Fernández Hernández ^[1] ; Rafael Bello Pérez ^[2] ; Yaima Filiberto Cabrera ^[1] ; Mabel Frías Dominguez ^[1] ; Yaile Caballero Mota ^[1]
1. [1] Universidad de Camagüey
  
  Universidad de Camagüey
  
  Cuba
2. [2] Universidad Central de Las Villas
  
  Universidad Central de Las Villas
  
  Cuba
Localización: Revista Cubana de Ciencias Informáticas, ISSN-e 2227-1899, Vol. 10, Nº. 4, 2016
Idioma: español
Títulos paralelos:
- Effect of the features selection in the classification based on prototypes
Enlaces
- Texto completo
Resumen
- español
  La selección de atributos es una técnica de procesamiento de datos cuyo objetivo es buscar un subconjunto de atributos que mejore el rendimiento del clasificador. Teniendo en cuenta que en los problemas de clasificación, la generación de prototipos es de gran utilidad, el principal aporte de este trabajo es proponer un nuevo método que integre la construcción de prototipos en este tipo de problemas con el método NP-BASIR (utilizando las relaciones de similaridad para realizar la granulación del universo, esta genera clases de similitud de objetos del universo, y para cada clase de similitud se construye un prototipo) combinado con el método de selección de atributos basado en la medida calidad de la similaridad para el cálculo de reductos utilizando la técnica de optimización Particle Swarm Optimization. El principal aporte de esta investigación es demostrar la utilidad de combinar selección de atributos unido a la construcción de prototipos. El algoritmo propuesto fue probado en conjuntos de datos internacionales y se comparó con algoritmos conocidos para la generación de prototipos. Los resultados experimentales muestran que el método propuesto obtuvo resultados satisfactorios, siendo la principal ventaja que se logra reducir en el conjunto de datos la cantidad de objetos y la cantidad de atributos sin variar significativamente la calidad de la clasificación comparada con el conjunto de datos original.
- English
  Feature selection is a preprocessing technique with the objective of finding a subset of attributes that improves the classifier performance. In this paper is proposed a new method for solving classification problems based on prototypes (NP-BASIR-Class method) using feature selection. When using similarity relations for the granulation of the universe, similarity classes are generated, and a prototype is constructed for each similarity class. The feature selection method used was REDUCT-SIM based in the technique of optimization PSO (Particle Swarm Optimization). The main contribution of this investigation is demonstrating the utility of combining feature selection together to the prototype generation. The proposed algorithm was proven in groups of international data set and it was compared with well-known algorithms for the generation of prototypes. The experimental results show that the proposed method obtained satisfactory results, being the main advantage that is possible to reduce in the data set, the quantity of objects and the quantity of features obtaining satisfactory results without varying significantly the quality of the classification compared with the original data set.
Referencias bibliográficas
- ALCALÁ, J. KEEL: A Software Tool to Assess Evolutionary Algorithms to Data Mining Problems. 13. 307-318
- ARAÚZO, A. (2006). Un sistema inteligente para selección de características en clasificación.. ), Universidad de Granada. ^eGranada Granada....
- BARANDELA, R. (2001). The nearest neighbour rule and the reduction of the training sample size Paper presented at the 9th Symposium on Pattern...
- BELLO-GARCIA, M,GARCIA, M. M,BELLO, R. (2013). A Method for Building Prototypes in the Nearest Prototype Approach Based on Similarity Relations...
- BERMEJO, S,CABESTANY, J. (2000). A Batch Learning Algorithm Vector Quantization Algorithm for Nearest Neighbour Classification. Neural Processing...
- BEZDEK, J. C,KUNCHEVA, L. I. (2001). Nearest Prototype classifiers design: an experimental study. 16. 1445-1473
- BIEN, J,TIBSHIRANI, R. (2012). Prototype selection for interpretable classification. 5. 2403-2424
- Ch, Y,D, M,R, W,K, W. A rough set approach to feature selection based on power set tree. 24. 275-281
- CHANDRASHEKAR, G,SAHIN, F. (2014). A survey on feature selection method. Computers and Electrical Engineering.
- DEMSAR, J. (2006). Statistical comparisons of classifiers over multiple data sets.. Journal of Machine Learning Research.
- ESPINILLA, M,QUESADA, F. J,MOYA, F,MARTINEZ, L,NUGENT. (2015). Reducing the response time for activity recognition through use of prototype...
- FERNÁNDEZ, F,ISASI, P. Evolutionary design of nearest prototype classifiers. 10. 431-454
- FERNÁNDEZ, Y,BELLO, R,FILIBERTO, Y,CABALLERO, Y,FRÍAS, M. (2013). Effects of using reducts in the performanceof the IRBASIR algoritmth. Revista...
- FERNÁNDEZ, Y,BELLO, R,FILIBERTO, Y,FRÍAS, M,COELLO, L,CABALLERO, Y. An Approach For Prototype Generation Based On Similarity Relations For...
- FILIBERTO, Y,BELLO, R,CABALLERO, Y,LARRUA, R. (2010). Using PSO and RST to Predict the Resistant Capacity of Connections in Composite Structures....
- FILIBERTO, Y,BELLO, R,CABALLERO, Y,LARRUA, R. (2010). A method to built similarity relations into extended Rough set theory.
- GARCÍA-DURAN, R,BORRAJO, D. (2010). A prototype-based method for classification with time constraints: a case study on automated planning..
- GARCÍA, S. (2010). Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data...
- GARCÍA, S,HERRERA, F. (2008). An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons....
- GARCÍA, S,HERRERA, F. Evolutionary under-sampling for classification with imbalanced data sets: Proposals and taxonomy. 17. 275-306
- HOLM, S. (1979). A simple sequentially rejective multiple test procedure. Journal of Statistics.
- IMAN, R,DAVENPORT, J. (1980). Approximations of the critical region of the friedman statistic. 9. 571
- JIANG, S,PANG, G,WU, M,KUANG, L. An improved K-nearest-neighbor algorithm for text categorization. 39. 1503-1509
- KOPLOWITZ, J,BROWN, T. A. On the relation of performance to editing in nearest neighbor rule. 13. 251-255
- LEÓN, E., J., G,GIRALDO, F. (2012). Online Cluster Prototype Generation for the Gravitational Clustering Algorithm Advances in Artificial...
- LIU, H,MOTODA, H. (2007). Computational Methods of Feature Selection.
- NANNI, L,LUMINI, A. Particle swarm optimization for prototype reduction. 72. 1092-1097
- PATANÉ, G,RUSSO, M. (2001). The Enhanced LBG Algorithm. 14. 1219-1237
- W. KIM, J. O. (2003). A Brief Taxonomy and Ranking of Creative Prototype Reduction Schemes.. 6. 232-244
- SÁNCHEZ, J. S. High training set size reduction by space partitioning and prototype abstraction.. 37. 1561-1564
- SHESKIN, D. (2003). Handbook of parametric and nonparametric statistical procedures, chapman & hall.
- TRIGUERO, I,DERRAC, J,GARCIA, S. (2011). A Taxonomy and Experimental Study on Prototype Generation for Nearest Neighbor Classification..
- TRIGUERO, I,DERRAC, J,GARCÍA, S,HERRERA, F. (2012). Integrating a Differential Evolution Feature Weighting scheme into Prototype Generation....
- LAM, W,KEUNG, C.K. Discovering useful concept prototypes for classiﬁcation based on ﬁltering and abstraction. 14. 1075-1090
- WITTEN, I,FRANK, E. (2005). Data Mining. Practical Machine Learning Tools and Techniques. University of Waikato.