Does the Order of Attributes Play an Important Role in Classification?

Tallón-Ballesteros, Antonio J.; Fong, Simon; Leal-Díaz, Rocío

doi:10.1007/978-3-030-29859-3_32

Antonio J. Tallón-Ballesteros¹³,
Simon Fong¹⁴ &
Rocío Leal-Díaz¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11734))

Included in the following conference series:

International Conference on Hybrid Artificial Intelligence Systems

1333 Accesses
3 Citations

Abstract

This paper proposes a methodology to feature sorting in the context of supervised machine learning algorithms. Feature sorting is defined as a procedure to order the initial arrangement of the attributes according to any sorting algorithm to assign an ordinal number to every feature, depending on its importance; later the initial features are sorted following the ordinal numbers from the first to the last, which are provided by the sorting method. Feature ranking has been chosen as the representative technique to fulfill the sorting purpose inside the feature selection area. This contribution aims at introducing a new methodology where all attributes are included in the data mining task, following different sortings by means of different feature ranking methods. The approach has been assessed in ten binary and multiple class problems with a number of features lower than 37 and a number of instances below than 106 up to 28056; the test-bed includes one challenging data set with 21 labels and 23 attributes where previous works were not able to achieve an accuracy of at least a fifty percent. ReliefF is a strong candidate to be applied in order to re-sort the initial characteristic space and C4.5 algorithm achieved a promising global performance; additionally, PART -a rule-based classifier- and Support Vector Machines obtained acceptable results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Amari, S.-I.: Mathematical foundations of neurocomputing. Proc. IEEE 78(9), 1443–1463 (1990)
Article Google Scholar
Azevedo, A.: Data mining and knowledge discovery in databases. In: Advanced Methodologies and Technologies in Network Architecture, Mobile Computing, and Data Analytics, pp. 502–514. IGI Global (2019)
Google Scholar
Bäck, T.: Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms. Oxford University Press, New York (1996)
MATH Google Scholar
Cho, S.-B., Tallón-Ballesteros, A.J.: Visual tools to lecture data analytics and engineering. In: Ferrández Vicente, J.M., Álvarez-Sánchez, J.R., de la Paz López, F., Toledo Moreo, J., Adeli, H. (eds.) IWINAC 2017. LNCS, vol. 10338, pp. 551–558. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59773-7_56
Chapter Google Scholar
Corchado, E., Corchado Rodrguez, J.M., Abraham, A.: Innovations in Hybrid Intelligent Systems, vol. 44. Springer Science & Business Media, Berlin (2007)
Book Google Scholar
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
Article Google Scholar
Cover, T.M., Thomas, J.A.: Elements of information theory. New York 68, 69–73 (1991)
Google Scholar
Corchado, E., Kurzyński, M., Woźniak, M. (eds.): HAIS 2011. LNCS (LNAI), vol. 6678. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21219-2
Book Google Scholar
Di Ruberto, C., Putzu, L., Arabnia, H.R., Quoc-Nam, T.: A feature learning framework for histology images classification. In: Emerging Trends in Applications and Infrastructures for Computational Biology, Bioinformatics, and Systems Biology: Systems and Applications, pp. 37–48. Elsevier Press (2016)
Google Scholar
Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: Shavlik, J. (ed.) Fifteenth International Conference on Machine Learning, pp. 144–151. Morgan Kaufmann (1998)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Exp. Newsl. 11(1), 10–18 (2009)
Article Google Scholar
He, J., Yang, Z., Yao, X.: Hybridisation of evolutionary programming and machine learning with k-nearest neighbor estimation. In: 2007 IEEE Congress on Evolutionary Computation, pp. 1693–1700. IEEE (2007)
Google Scholar
Huber, S., Wiemer, H., Schneider, D., Ihlenfeldt, S.: DMME: data mining methodology for engineering applications-a holistic extension to the CRISP-DM model. Procedia CIRP 79, 403–408 (2019)
Article Google Scholar
Jordan, M.I., Mitchell, T.M.: Machine learning: trends, perspectives, and prospects. Science 349(6245), 255–260 (2015)
Article MathSciNet Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)
Article Google Scholar
Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-57868-4_57
Chapter Google Scholar
Kruse, R., Gebhardt, J.E., Klawon, F.: Foundations of Fuzzy Systems. John Wiley & Sons Inc., New York (1994)
Google Scholar
Liu, W., Liu, S., Gu, Q., Chen, X., Chen, D.: FECS: a cluster based feature selection method for software fault prediction with noises. In: 2015 IEEE 39th Annual Computer Software and Applications Conference, vol. 2, pp. 276–281. IEEE (2015)
Google Scholar
May, T., Bannach, A., Davey, J., Ruppert, T., Kohlhammer, J.: Guiding feature subset selection with an interactive visualization. In: 2011 IEEE Conference on Visual Analytics Science and Technology (VAST), pp. 111–120. IEEE (2011)
Google Scholar
Michalski, R.S., Mozetic, I., Hong, J., Lavrac, N.: The multi-purpose incremental learning system AQ15 and its testing application to three medical domains. In: Proceedings of the AAAI 1986, pp. 1–041 (1986)
Google Scholar
Narendra, P.M., Fukunaga, K.: A branch and bound algorithm for feature subset selection. IEEE Trans. Comput. 9, 917–922 (1977)
Article Google Scholar
Ortega, J., Fisher, D.: Flexibly exploiting prior knowledge in empirical learning. In: IJCAI, pp. 1041–1049 (1995)
Google Scholar
Pawlak, Z.: Rough sets. Int. J. Comput. Inf. Sci. 11(5), 341–356 (1982)
Article Google Scholar
Prechelt, L.: Proben 1-a set of benchmarks and benchmarking rules for neural network training algorithms (1994)
Google Scholar
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning, vol. 1. Morgan Kaufmann, San Mateo (1993)
Google Scholar
Salguero, A.G., Medina, J., Delatorre, P., Espinilla, M.: Methodology for improving classification accuracy using ontologies: application in the recognition of activities of daily living. J. Ambient Intell. Humaniz. Comput. 10(6), 2125–2142 (2019)
Article Google Scholar
Sánchez-Maroño, N., Alonso-Betanzos, A., Tombilla-Sanromán, M.: Filter methods for feature selection – a comparative study. In: Yin, H., Tino, P., Corchado, E., Byrne, W., Yao, X. (eds.) IDEAL 2007. LNCS, vol. 4881, pp. 178–187. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-77226-2_19
Chapter Google Scholar
Tallón-Ballesteros, A.J., Cavique, L., Fong, S.: Addressing low dimensionality feature subset selection: ReliefF(-k) or Extended Correlation-Based Feature Selection(eCFS)? In: Martínez Álvarez, F., Troncoso Lora, A., Sáez Muñoz, J.A., Quintián, H., Corchado, E. (eds.) SOCO 2019. AISC, vol. 950, pp. 251–260. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-20055-8_24
Chapter Google Scholar
Tallón-Ballesteros, A.J., Gutiérrez-Peña, P.A., Hervás-Martínez, R.: Distribution of the search of evolutionary product unit neural networks for classification. arXiv preprint arXiv:1205.3336 (2012)
Tallón-Ballesteros, A.J., Hervás-Martínez, C., Riquelme, J.C., Ruiz, R.: Improving the accuracy of a two-stage algorithm in evolutionary product unit neural networks for classification by means of feature selection. In: Ferrández, J.M., Álvarez Sánchez, J.R., de la Paz, F., Toledo, F.J. (eds.) IWINAC 2011. LNCS, vol. 6687, pp. 381–390. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21326-7_41
Chapter Google Scholar
Tallón-Ballesteros, A.J., Riquelme. J.C.: Deleting or keeping outliers for classifier training? In: 2014 Sixth World Congress on Nature and Biologically Inspired Computing (NaBIC 2014), pp. 281–286. IEEE (2014)
Google Scholar
Tallón-Ballesteros, A.J., Riquelme, J.C.: Low dimensionality or same subsets as a result of feature selection: an in-depth roadmap. In: Ferrández Vicente, J.M., Álvarez-Sánchez, J.R., de la Paz López, F., Toledo Moreo, J., Adeli, H. (eds.) IWINAC 2017. LNCS, vol. 10338, pp. 531–539. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59773-7_54
Chapter Google Scholar
Tallón-Ballesteros, A.J., Riquelme, J.C., Ruiz, R.: Accuracy increase on evolving product unit neural networks via feature subset selection. In: Martínez-Álvarez, F., Troncoso, A., Quintián, H., Corchado, E. (eds.) HAIS 2016. LNCS (LNAI), vol. 9648, pp. 136–148. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32034-2_12
Chapter Google Scholar
Tallón-Ballesteros, A.J., Riquelme, J.C., Ruiz, R.: Semi-wrapper feature subset selector for feed-forward neural networks: applications to binary and multi-class classification problems. Neurocomputing 353, 28–44 (2019)
Article Google Scholar
Tallón-Ballesteros, A.J., Tuba, M., Xue, B., Hashimoto, T.: Feature selection and interpretable feature transformation: a preliminary study on feature engineering for classification algorithms. In: Yin, H., Camacho, D., Novais, P., Tallón-Ballesteros, A.J. (eds.) IDEAL 2018. LNCS, vol. 11315, pp. 280–287. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03496-2_31
Chapter Google Scholar
Tan, P.-N.: Introduction to Data Mining. Pearson Education India, India (2018)
Google Scholar
ML UCI. Repository, the uc irvine machine learning repository (2017). http://archive.ics.uci.edu/ml/
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995). https://doi.org/10.1007/978-1-4757-2440-0
Book MATH Google Scholar
Weiss, G.: Multiagents systems (1999)
Google Scholar
Wirth, R., Hipp, J.: CRISP-DM: towards a standard process model for data mining. In: Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, pp. 29–39. Citeseer (2000)
Google Scholar
Xu, G., Zong, Y., Yang, Z.: Applied Data Mining. CRC Press, Boca Raton (2013)
Book Google Scholar

Download references

Acknowledgments

This work has been partially subsidised by TIN2014-55894-C2-R and TIN2017-88209-C2-2-R projects of the Spanish Inter-Ministerial Commission of Science and Technology (MICYT), FEDER funds and the P11-TIC-7528 project of the “Junta de Andalucía” (Spain).

Author information

Authors and Affiliations

Department of Electronic, Computer Systems and Automation Engineering, University of Huelva, Huelva, Spain
Antonio J. Tallón-Ballesteros
Department of Computer and Information Science, University of Macau, Taipa, Macao, Special Administrative Region of China
Simon Fong
Higher School of Computer Science, University of Seville, Seville, Spain
Rocío Leal-Díaz

Authors

Antonio J. Tallón-Ballesteros
View author publications
You can also search for this author in PubMed Google Scholar
Simon Fong
View author publications
You can also search for this author in PubMed Google Scholar
Rocío Leal-Díaz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antonio J. Tallón-Ballesteros .

Editor information

Editors and Affiliations

University of León, León, Spain
Hilde Pérez García
University of León, León, Spain
Lidia Sánchez González
University of León, León, Spain
Manuel Castejón Limas
University of A Coruña, Ferrol, Spain
Héctor Quintián Pardo
University of Salamanca, Salamanca, Spain
Emilio Corchado Rodríguez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tallón-Ballesteros, A.J., Fong, S., Leal-Díaz, R. (2019). Does the Order of Attributes Play an Important Role in Classification?. In: Pérez García, H., Sánchez González, L., Castejón Limas, M., Quintián Pardo, H., Corchado Rodríguez, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2019. Lecture Notes in Computer Science(), vol 11734. Springer, Cham. https://doi.org/10.1007/978-3-030-29859-3_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-29859-3_32
Published: 26 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29858-6
Online ISBN: 978-3-030-29859-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics