Proactive Forest: Análisis del impacto de la generalización del parámetro de diversidad

Nayma Cepero Pérez; Mailyn Moreno Espino; Milton García Borroto; Eduardo F. Morales

Ayuda

Proactive Forest: Análisis del impacto de la generalización del parámetro de diversidad

Autores: Nayma Cepero Pérez, Mailyn Moreno Espino, Milton García Borroto, Eduardo F. Morales
Localización: Revista Cubana de Ciencias Informáticas, ISSN-e 2227-1899, Vol. 17, Nº. 1, 2023
Idioma: español
Títulos paralelos:
- Proactive Forest: Analysis of the impact of the generalization of the diversity parameter
Enlaces
- Texto completo
Resumen
- español
  Resumen La facilidad para interpretar las predicciones realizadas por un modelo aprendido constituye una de las ven tajas que hacen de los árboles de decisión, una de las técnicas más efectivas a la hora de enfrentar una tarea de minería de datos. Las predicciones realizadas por muchos árboles de decisión pueden ser combinadas con el objetivo de mejorar la decisión final, de esta idea surge el concepto de bosques de decisión. Es condición necesaria para construir un bosque de decisión, que los arboles individuales tengan un alto poder predictivo y al mismo tiempo sean diferentes entre ellos. Esta diferencia es conocida como diversidad del bosque de decisión, conseguirla no es un proceso trivial. Los algoritmos de bosques de decisión más empleados utilizan aleatoriedad en el proceso de construcción de cada árbol para obtener diversidad; sin embargo, el uso de la aleatoriedad no siempre garantiza obtener una diversidad adecuada. Proactive Forest es un algoritmo cons tructor de bosques de decisión que introduce un mecanismo de control de aleatoriedad a partir de la definición de una función de actualización de las probabilidades con las que se utilizan los atributos, uno de los ele mentos más importantes es el parámetro de diversidad que se definió como 0.1 inicialmente. El objetivo de este trabajo es analizar el uso de un único valor del parámetro de diversidad para todas las bases de datos. En los resultados se demuestra que no es correcto generalizar un valor de diversidad, ya que la eficacia se afecta según el valor que se use.
- English
  Abstract The ease of interpreting the predictions made by a learned model is one of the advantages that make decision trees one of the most effective techniques when facing a data-mining task. The predictions made by many decision trees can be combined in order to improve the final decision, from this idea arises the concept of decision forests. It is a necessary condition for building a decision forest that the individual trees have a high predictive power and at the same time are different from each other. This difference is known as decision forest diversity, and achieving it is not a trivial process. The most commonly used decision forest algorithms use randomization in the process of constructing each tree to obtain diversity; however, the use of randomization does not always guarantee obtaining adequate diversity. Proactive Forest is a decision forest construction algorithm that introduces a randomness control mechanism based on the definition of an update function of the probabilities with which the attributes are used, one of the most important elements is the diversity parameter that was initially defined as 0.1. The objective of this work is to analyze the use of a single value of the diversity parameter for all the databases. The results show that it is not correct to generalize a diversity value, since the effectiveness is affected depending on the value used.
Referencias bibliográficas
- Ali, Kamal M,Pazzani, Michael J. (1995). On the link between error correlation and error reduction in decision tree ensembles.
- Breiman, Leo. (1996). Bagging predictors. Machine learning. 24. 123
- Breiman, Leo. (2001). Random forests. Machine learning. 45. 5-32
- Brown, Gavin,Wyatt, Jeremy L,Tino, Peter,Bengio, Yoshua. (2005). Managing diversity in regression ensembles. Journal of machine learning research....
- Cepero-Perez, Nayma,Denis-Miranda, Luis Alberto,Hernandez-Palacio, Rafael,Moreno-Espino, Mailyn,García-Borroto, Milton. (2018). Proactive...
- Chawla, Nitesh V,Hall, Lawrence O,Bowyer, Kevin W,W Philip, Kegelmeyer. (2004). Learning ensembles from bites: A scalable and accurate approach....
- Dheeru, Dua,Taniskidou, Efi Karra. (2017). UCI machine learning repository.
- Fan, Ping. (2022). Random forest algorithm based on speech for early identification of parkinson’s disease. Compu tational Intelligence and...
- Freund, Yoav,Schapire, Robert E. (1996). Experiments with a new boosting algorithm. icml.
- Garcia-Borroto, Milton,Martinez-Trinidad, Jose Fco,Carrasco-Ochoa, Jesus Ariel. (2015). Finding the best diver sity generation procedures...
- Ho, Tin Kam. (1998). The random subspace method for constructing decision forests. IEEE transactions on pattern analysis and machine intelligence....
- Lin, Shih-Wei,Chen, Shih-Chieh. (2012). Parameter determination and feature selection for c4. 5 algorithm using scatter search approach. Soft...
- Lundberg, Scott M,Lee, Su-In. (2017). A unified approach to interpreting model predictions. Advances in neural information processing systems....
- Lundberg, Scott M,Erion, Gabriel G,Lee, Su-In. (2018). Consistent individualized feature attribution for tree ensembles. arXiv preprint arXiv....
- Lundberg, Scott M,Erion, Gabriel,Chen, Hugh,DeGrave, Alex,Prutkin, Jordan M,Nair, Bala,Katz, Ronit,Himmelfarb, Jonathan,Bansal, Nisha,Lee,...
- Mitchell, Tom M. (1980). The need for biases in learning generalizations. Department of Computer Science, Labora tory for Computer Science...
- Molnar, Christoph. (2018). A guide for making black box models explainable.
- Polikar, Robi. (2006). Ensemble based systems in decision making. IEEE Circuits and systems magazine. 6. 21-45
- Rokach, Lior. (2008). Genetic algorithm-based feature set partitioning for classification problems. Pattern Recogni tion. 41. 1676