Skip to main content

Does the Order of Attributes Play an Important Role in Classification?

  • Conference paper
  • First Online:
Hybrid Artificial Intelligent Systems (HAIS 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11734))

Included in the following conference series:

Abstract

This paper proposes a methodology to feature sorting in the context of supervised machine learning algorithms. Feature sorting is defined as a procedure to order the initial arrangement of the attributes according to any sorting algorithm to assign an ordinal number to every feature, depending on its importance; later the initial features are sorted following the ordinal numbers from the first to the last, which are provided by the sorting method. Feature ranking has been chosen as the representative technique to fulfill the sorting purpose inside the feature selection area. This contribution aims at introducing a new methodology where all attributes are included in the data mining task, following different sortings by means of different feature ranking methods. The approach has been assessed in ten binary and multiple class problems with a number of features lower than 37 and a number of instances below than 106 up to 28056; the test-bed includes one challenging data set with 21 labels and 23 attributes where previous works were not able to achieve an accuracy of at least a fifty percent. ReliefF is a strong candidate to be applied in order to re-sort the initial characteristic space and C4.5 algorithm achieved a promising global performance; additionally, PART -a rule-based classifier- and Support Vector Machines obtained acceptable results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amari, S.-I.: Mathematical foundations of neurocomputing. Proc. IEEE 78(9), 1443–1463 (1990)

    Article  Google Scholar 

  2. Azevedo, A.: Data mining and knowledge discovery in databases. In: Advanced Methodologies and Technologies in Network Architecture, Mobile Computing, and Data Analytics, pp. 502–514. IGI Global (2019)

    Google Scholar 

  3. Bäck, T.: Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms. Oxford University Press, New York (1996)

    MATH  Google Scholar 

  4. Cho, S.-B., Tallón-Ballesteros, A.J.: Visual tools to lecture data analytics and engineering. In: Ferrández Vicente, J.M., Álvarez-Sánchez, J.R., de la Paz López, F., Toledo Moreo, J., Adeli, H. (eds.) IWINAC 2017. LNCS, vol. 10338, pp. 551–558. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59773-7_56

    Chapter  Google Scholar 

  5. Corchado, E., Corchado Rodrguez, J.M., Abraham, A.: Innovations in Hybrid Intelligent Systems, vol. 44. Springer Science & Business Media, Berlin (2007)

    Book  Google Scholar 

  6. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)

    Article  Google Scholar 

  7. Cover, T.M., Thomas, J.A.: Elements of information theory. New York 68, 69–73 (1991)

    Google Scholar 

  8. Corchado, E., Kurzyński, M., Woźniak, M. (eds.): HAIS 2011. LNCS (LNAI), vol. 6678. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21219-2

    Book  Google Scholar 

  9. Di Ruberto, C., Putzu, L., Arabnia, H.R., Quoc-Nam, T.: A feature learning framework for histology images classification. In: Emerging Trends in Applications and Infrastructures for Computational Biology, Bioinformatics, and Systems Biology: Systems and Applications, pp. 37–48. Elsevier Press (2016)

    Google Scholar 

  10. Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: Shavlik, J. (ed.) Fifteenth International Conference on Machine Learning, pp. 144–151. Morgan Kaufmann (1998)

    Google Scholar 

  11. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Exp. Newsl. 11(1), 10–18 (2009)

    Article  Google Scholar 

  12. He, J., Yang, Z., Yao, X.: Hybridisation of evolutionary programming and machine learning with k-nearest neighbor estimation. In: 2007 IEEE Congress on Evolutionary Computation, pp. 1693–1700. IEEE (2007)

    Google Scholar 

  13. Huber, S., Wiemer, H., Schneider, D., Ihlenfeldt, S.: DMME: data mining methodology for engineering applications-a holistic extension to the CRISP-DM model. Procedia CIRP 79, 403–408 (2019)

    Article  Google Scholar 

  14. Jordan, M.I., Mitchell, T.M.: Machine learning: trends, perspectives, and prospects. Science 349(6245), 255–260 (2015)

    Article  MathSciNet  Google Scholar 

  15. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)

    Article  Google Scholar 

  16. Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-57868-4_57

    Chapter  Google Scholar 

  17. Kruse, R., Gebhardt, J.E., Klawon, F.: Foundations of Fuzzy Systems. John Wiley & Sons Inc., New York (1994)

    Google Scholar 

  18. Liu, W., Liu, S., Gu, Q., Chen, X., Chen, D.: FECS: a cluster based feature selection method for software fault prediction with noises. In: 2015 IEEE 39th Annual Computer Software and Applications Conference, vol. 2, pp. 276–281. IEEE (2015)

    Google Scholar 

  19. May, T., Bannach, A., Davey, J., Ruppert, T., Kohlhammer, J.: Guiding feature subset selection with an interactive visualization. In: 2011 IEEE Conference on Visual Analytics Science and Technology (VAST), pp. 111–120. IEEE (2011)

    Google Scholar 

  20. Michalski, R.S., Mozetic, I., Hong, J., Lavrac, N.: The multi-purpose incremental learning system AQ15 and its testing application to three medical domains. In: Proceedings of the AAAI 1986, pp. 1–041 (1986)

    Google Scholar 

  21. Narendra, P.M., Fukunaga, K.: A branch and bound algorithm for feature subset selection. IEEE Trans. Comput. 9, 917–922 (1977)

    Article  Google Scholar 

  22. Ortega, J., Fisher, D.: Flexibly exploiting prior knowledge in empirical learning. In: IJCAI, pp. 1041–1049 (1995)

    Google Scholar 

  23. Pawlak, Z.: Rough sets. Int. J. Comput. Inf. Sci. 11(5), 341–356 (1982)

    Article  Google Scholar 

  24. Prechelt, L.: Proben 1-a set of benchmarks and benchmarking rules for neural network training algorithms (1994)

    Google Scholar 

  25. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)

    Google Scholar 

  26. Quinlan, J.R.: C4.5: Programs for Machine Learning, vol. 1. Morgan Kaufmann, San Mateo (1993)

    Google Scholar 

  27. Salguero, A.G., Medina, J., Delatorre, P., Espinilla, M.: Methodology for improving classification accuracy using ontologies: application in the recognition of activities of daily living. J. Ambient Intell. Humaniz. Comput. 10(6), 2125–2142 (2019)

    Article  Google Scholar 

  28. Sánchez-Maroño, N., Alonso-Betanzos, A., Tombilla-Sanromán, M.: Filter methods for feature selection – a comparative study. In: Yin, H., Tino, P., Corchado, E., Byrne, W., Yao, X. (eds.) IDEAL 2007. LNCS, vol. 4881, pp. 178–187. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-77226-2_19

    Chapter  Google Scholar 

  29. Tallón-Ballesteros, A.J., Cavique, L., Fong, S.: Addressing low dimensionality feature subset selection: ReliefF(-k) or Extended Correlation-Based Feature Selection(eCFS)? In: Martínez Álvarez, F., Troncoso Lora, A., Sáez Muñoz, J.A., Quintián, H., Corchado, E. (eds.) SOCO 2019. AISC, vol. 950, pp. 251–260. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-20055-8_24

    Chapter  Google Scholar 

  30. Tallón-Ballesteros, A.J., Gutiérrez-Peña, P.A., Hervás-Martínez, R.: Distribution of the search of evolutionary product unit neural networks for classification. arXiv preprint arXiv:1205.3336 (2012)

  31. Tallón-Ballesteros, A.J., Hervás-Martínez, C., Riquelme, J.C., Ruiz, R.: Improving the accuracy of a two-stage algorithm in evolutionary product unit neural networks for classification by means of feature selection. In: Ferrández, J.M., Álvarez Sánchez, J.R., de la Paz, F., Toledo, F.J. (eds.) IWINAC 2011. LNCS, vol. 6687, pp. 381–390. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21326-7_41

    Chapter  Google Scholar 

  32. Tallón-Ballesteros, A.J., Riquelme. J.C.: Deleting or keeping outliers for classifier training? In: 2014 Sixth World Congress on Nature and Biologically Inspired Computing (NaBIC 2014), pp. 281–286. IEEE (2014)

    Google Scholar 

  33. Tallón-Ballesteros, A.J., Riquelme, J.C.: Low dimensionality or same subsets as a result of feature selection: an in-depth roadmap. In: Ferrández Vicente, J.M., Álvarez-Sánchez, J.R., de la Paz López, F., Toledo Moreo, J., Adeli, H. (eds.) IWINAC 2017. LNCS, vol. 10338, pp. 531–539. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59773-7_54

    Chapter  Google Scholar 

  34. Tallón-Ballesteros, A.J., Riquelme, J.C., Ruiz, R.: Accuracy increase on evolving product unit neural networks via feature subset selection. In: Martínez-Álvarez, F., Troncoso, A., Quintián, H., Corchado, E. (eds.) HAIS 2016. LNCS (LNAI), vol. 9648, pp. 136–148. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32034-2_12

    Chapter  Google Scholar 

  35. Tallón-Ballesteros, A.J., Riquelme, J.C., Ruiz, R.: Semi-wrapper feature subset selector for feed-forward neural networks: applications to binary and multi-class classification problems. Neurocomputing 353, 28–44 (2019)

    Article  Google Scholar 

  36. Tallón-Ballesteros, A.J., Tuba, M., Xue, B., Hashimoto, T.: Feature selection and interpretable feature transformation: a preliminary study on feature engineering for classification algorithms. In: Yin, H., Camacho, D., Novais, P., Tallón-Ballesteros, A.J. (eds.) IDEAL 2018. LNCS, vol. 11315, pp. 280–287. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03496-2_31

    Chapter  Google Scholar 

  37. Tan, P.-N.: Introduction to Data Mining. Pearson Education India, India (2018)

    Google Scholar 

  38. ML UCI. Repository, the uc irvine machine learning repository (2017). http://archive.ics.uci.edu/ml/

  39. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995). https://doi.org/10.1007/978-1-4757-2440-0

    Book  MATH  Google Scholar 

  40. Weiss, G.: Multiagents systems (1999)

    Google Scholar 

  41. Wirth, R., Hipp, J.: CRISP-DM: towards a standard process model for data mining. In: Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, pp. 29–39. Citeseer (2000)

    Google Scholar 

  42. Xu, G., Zong, Y., Yang, Z.: Applied Data Mining. CRC Press, Boca Raton (2013)

    Book  Google Scholar 

Download references

Acknowledgments

This work has been partially subsidised by TIN2014-55894-C2-R and TIN2017-88209-C2-2-R projects of the Spanish Inter-Ministerial Commission of Science and Technology (MICYT), FEDER funds and the P11-TIC-7528 project of the “Junta de Andalucía” (Spain).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio J. Tallón-Ballesteros .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tallón-Ballesteros, A.J., Fong, S., Leal-Díaz, R. (2019). Does the Order of Attributes Play an Important Role in Classification?. In: Pérez García, H., Sánchez González, L., Castejón Limas, M., Quintián Pardo, H., Corchado Rodríguez, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2019. Lecture Notes in Computer Science(), vol 11734. Springer, Cham. https://doi.org/10.1007/978-3-030-29859-3_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29859-3_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29858-6

  • Online ISBN: 978-3-030-29859-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics