Una metodología para encontrar el mejor clasificador en decisión empresarial

José C. Vega Vilca; David A. Torres Núñez

Ayuda

Una metodología para encontrar el mejor clasificador en decisión empresarial

Autores: José C. Vega Vilca, David A. Torres Núñez
Localización: Revista de ciencias económicas, ISSN 0252-9521, ISSN-e 2215-3489, Vol. 33, Nº. 1, 2015, págs. 63-73
Idioma: español
DOI: 10.15517/rce.v33i1.19971
Enlaces
- Texto completo
Resumen
- español
  En la investigación, se presenta una metodología para mejorar las estrategias de análisis en situaciones donde la clasificación supervisada se convierte en la herramienta fundamental de decisión empresarial. La necesidad de catalogar a los nuevos clientes en uno de varios grupos, definidos de acuerdo a las características del sujeto, es analizada mediante el cálculo de la tasa de error. Con este propósito, se elaboraron programas en lenguaje R para calcular la tasa de error de cada uno de los nueve clasificadores, usando el método de validación cruzada 10 (Stone, 1974), en 50 permutaciones de los datos en estudio. Para cada conjunto de datos analizados se demostró, mediante ANOVA, que efectivamente existen diferencias significativas en el promedio de tasas de error de los clasificadores (p=0.00); por lo tanto, se concluye que el mejor clasificador es aquel con la mínima tasa de error
- English
  In this research, a methodology is presented to improve strategies of analysis in situations where supervised classification becomes the fundamental tool for business decision.
  
  The need to categorize the new customers into one of several groups, according to the characteristics of the subject, is analyzed through the calculation of the error rate.
  
  Programs were written using the statistical software package R, to calculate the error rate of each of nine classifiers, using cross-validation method 10 (Stone, 1974), in the 50 permutations of the data under consideration. For each of the analyzed data sets it was demonstrated, through ANOVA, that there are indeed significant differences in the average error rates of classifiers (p=0.00); therefore, it is concluded that the best classifier is the one with the lowest error rate.
Referencias bibliográficas
- Antipov, E., & Pokryshevskaya, E. (2010). Applying CHAID for logistic regression diagnostics and classification accuracy improvement....
- Blake, C. L., & Merz, C. J. (1998). Churn Data Set. University of California. Department of Information and Computer Science, Irvin, CA....
- Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and Regression Trees. Boca Raton, FL: CRC Press LLC.
- Dobson, A. (2002). An Introduction to Generalized Linear Models. Boca Raton, FL: CRC Press LLC. doi:10.1002/sim.1493
- Hothorn, T., Hornik, K., van de Wiel, M., & Zeileis, A (2006). A Lego System for Conditional Inference. The American Statistician, 60...
- Manning, C., Raghavan, P., & Schutze, H. (2008). Introduction to Information Retrieval. London: Cambridge University Press.
- Ripley, B. D. (1996). Pattern Recognition and Neural Networks. London: Cambridge University Press.
- Smith, C. (1947). Some examples of discrimination. Ann. Eugenic 18, 272–282.
- Stone, M. (1974). Cross-validatory choice and the assessment of statistical predictions (with discussion). Journal of the Royal Statistical...
- Venables, W. N., & Ripley, B. D. (2002). Modern Applied Statistics with S. New York, NY: Springer-Verlag. doi:10.1007/978-0-387-21706-2
- Witten, I., Frank, E., & Hall, M. (2011). Data Mining: Practical Machine Learning Tools and Techniques. Burlington, MA: Morgan Kaufmann.