Ir al contenido

Documat


Aprendiendo con detección de cambio online

    1. [1] Universidad de las Ciencias Informáticas

      Universidad de las Ciencias Informáticas

      Cuba

    2. [2] Universidad de Málaga

      Universidad de Málaga

      Málaga, España

    3. [3] Universidad De Granma

      Universidad De Granma

      Cuba

    4. [4] Universidad de Camagüey

      Universidad de Camagüey

      Cuba

  • Localización: Computación y Sistemas (CyS), ISSN 1405-5546, ISSN-e 2007-9737, Vol. 18, Nº. 1, 2014, págs. 169-183
  • Idioma: español
  • DOI: 10.13053/CyS-18-1-2014-026
  • Títulos paralelos:
    • Learning with Online Drift Detection
  • Enlaces
  • Resumen
    • español

      En la actualidad, muchas fuentes generan grandes cantidades de datos en largos períodos de tiempo, requiriéndose su procesamiento incremental. Debido a la dimensión temporal de estos datos, un modelo de aprendizaje inducido previamente puede ser inconsistente con los datos actuales, problema comúnmente conocido como cambio de concepto. Una estrategia ampliamente usada para detectar cambio de concepto supervisa a lo largo del tiempo alguna medida de rendimiento del modelo. Si se estima un deterioro significativo del modelo mediante dicha medida se ejecutan algunas acciones para adaptar el aprendizaje. En este sentido, en el presente artículo se propone un nuevo método para detectar cambio de concepto no dependiente del algoritmo de aprendizaje. Se usa la inecuación de probabilidad de Hoeffding para ofrecer garantías probabilísticas de detección de cambios en la media de flujos de valores reales. Dicho método se basa en la comparación de medias correspondientes a dos muestras, mediante la identificación de un único punto de corte relevante en dicha secuencia de valores reales; manteniendo así un número fijo de contadores además con complejidad temporal constante. Evaluaciones empíricas preliminares considerando conocidos flujos de datos, diferentes detectores de cambio de concepto y algoritmos de aprendizaje muestran promisorio el método propuesto.

    • English

      Learning in data streams is a problem of growing interest. The target function of data streams may change over time, so in such situations, a learning model induced with some previous data may be inconsistent with the current data. This problem is commonly known as concept drift. The strategy broadly used to handle concept drift is to continuously monitor a chosen performance measure of the model over time; if the model performance drops, adequate actions are executed to adapt the model. Taking this into account, our paper proposes a new method to detect drifting concepts, which is independent of the learning algorithm. We use a probability inequality (Hoeffding's inequality) to offer probabilistic guarantees for the detection of significant changes in the mean of real values. The detection is based on the comparison of averages corresponding to two samples by means of identification of a single relevant cut-point in this sequence of real values maintaining a fixed number of counters and with constant time complexity. As some previous approaches, our method is based on ideas of statistical process control. Preliminary empirical evaluations considering well-known data streams, change detectors and various classifiers reveal advantages of the proposed method.

  • Referencias bibliográficas
    • Agrawal, R.,Imielinski, T.,Swami, A.. (1993). Database mining: A performance perspective. IEEE Transaction on Knowledge and Data Engineering....
    • Aha, D.W.,Kibler, D.,Albert, M.K.. (1991). Instance-based learning algorithms. Machine Learning. 6. 37-66
    • Babcock, B.,Babu, S.,Datar, M.,Motwani, R.,Widom, J.. (2002). Models and issues in data stream systems. Twenty-first ACM SIGMOD-SIGACT-SIGART...
    • Baena-García, M.,del Campo-Ávila, J.,Fidalgo, R.,Bifet, A.,Gavaldà, R.,Morales-Bueno, R.. (2006). Early Drift Detection Method. Fourth International...
    • Basseville, M.,Nikiforov, I.V.. (1993). Detection of Abrupt Changes: Theory and Application. Prentice-Hall. Englewood Cliffs^eNJ NJ.
    • Beringer, J.,Hüllermeier, E.. (2007). Efficient instance-based learning on data streams. Intelligent Data Analysis. 11. 627-650
    • Bifet, A.,Gavaldà, R.. (2007). Learning from time-changing data with adaptive windowing. SIAM International Conference on Data Mining. 443-448
    • Bifet, A.,Gavaldá, R.. (2009). Adaptive learning from evolving data streams. 8th International Symposium on Intelligent Data Analysis: Advances...
    • Bifet, A.,Holmes, G.,Kirkby, R.,Pfahringer, B.. (2010). MOA: Massive Online Analysis. Journal of Machine Learning Research. 11. 1601-1604
    • Bifet, A.,Holmes, G.,Pfahringer, B.,Frank, E.. (2010). Fast perceptron decision tree learning from evolving data streams. 14th Pacific-Asia...
    • Bifet, A.,Holmes, G.,Pfahringer, B.,Kirkby, R.,Gavaldà, R.. (2009). New ensemble methods for evolving data streams. 15th ACM SIGKDD International...
    • del Campo-Ávila, J.,Ramos-Jiménez, G.,Gama, J.,Morales-Bueno, R.. (2008). Improving the performance of an incremental algorithm driven by...
    • Chernoff, H.. (1952). A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the Sum of Observations. Annals of Mathematical...
    • Cunningham, P.,Nowlan, N.,Delany, S.J.,Haahr, M.. (2003). A case-based approach to spam filtering that can track concept drift. ICCBR'2003...
    • Datar, M.,Gionis, A.,Indyk, P.,Motwani, R.. (2002). Maintaining stream statistics over sliding windows. SIAM Journal on Computing. 31. 1794-1813
    • Domingos, P.,Hulten, G.. (2000). Mining HighSpeed Data Streams. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data...
    • Dredze, M.,Oates, T.,Piatko, C.. (2010). We're not in Kansas anymore: detecting domain changes in streams. Conference on Empirical Methods...
    • Dries, A.,Rückert, U.. (2009). Adaptive concept drift detection. Statistical Analysis and Data Mining. 2. 311-327
    • Ferrer-Troyano, F.J.,Aguilar, J.S.,Riquelme, J.C.. (2005). Incremental Rule Learning and Border Examples Selection from Numerical Data Streams....
    • Frías, I.,Ortiz, A.,Ramos, G.,Morales, R.,Caballero, Y.. (2010). Clasificadores y multiclasificadores con cambio de concepto basados en árboles...
    • Gama, J.. (2010). Knowledge Discovery from Data Streams. Chapman and HallCRC. Boca Raton^eFL FL.
    • Gama, J.,Gaber, M.M.. (2007). Learning from Data Streams: Processing Techniques in Sensor Networks. Springer. BerlinNew York.
    • Gama, J.,Kosina, P.. (2011). Learning decision rules from data streams. Twenty-Second International Joint Conference on Artificial Intelligence....
    • Gama, J.,Medas, P.,Castillo, G.,Rodrigues, P.. (2004). Learning with drift detection. Advances in Artificial Intelligence, SBIA 2004, Lecture...
    • Gama, J.,Sebastião, R.,Rodrigues, P.. (2009). Issues in Evaluation of Stream Learning Algorithms. 15th ACM SIGKDD International Conference...
    • Gretton, A.,Borgwardt, K.,Rasch, M.,Schõlkopf, B.,Smola, A.. (2006). A Kernel Method for the Two Sample Problem. Twentieth Annual Conference...
    • Harries, M.B.,Sammut, C.,Horn, K.. (1998). Extracting hidden context. Machine Learning -Special issue on context sensitivity and concept drift....
    • Hawkins, D.M.,Deng, Q.. (2010). A Nonparametric Change-Point Control Chart. Journal of Quality Technology. 42. 165-173
    • Hoeffding, W.. (1963). Probability inequalities for sums of bounded random variables. Journal of American Statistical Association. 58. 13-30
    • Ikonomovska, E.,Gama, J.,Dzeroski, S.. (2011). Learning model trees from evolving data streams. Data Mining and Knowledge Discovery. 23. 128-168
    • Katakis, I.,Tsoumakas, G.,Vlahavas, I.. (2008). An Ensemble of Classifiers for coping with Recurring Contexts in Data Streams. conference...
    • Kawahara, Y.,Sugiyama, M.. (2009). Change-Point Detection in Time-Series Data by Direct Density-Ratio Estimation. SIAM International Conference...
    • Kawahara, Y.,Yairi, T.,Machida, K.. (2007). Change-Point Detection in Time-Series Data Based on Subspace Identification. Seventh IEEE International...
    • Kifer, D.,Ben-David, S.,Gehrke, J.. (2004). Detecting Change in Data Streams. Thirtieth International Conference on Very Large Data Bases...
    • Klinkenberg, R.. (2004). Learning drifting concepts: example selection vs. example weighting.. Intelligent Data Analysis. 8. 281-300
    • Klinkenberg, R.,Joachims, T.. (2000). Detecting Concept Drift with Support Vector Machines. Seventeenth International Conference on Machine...
    • Kubat, M.,Widmer, G.. (1994). Adapting to drift in continuous domains. Machine Learning: ECML-95, Lecture Notes in Computer Science. 912....
    • Pettitt, A.N.. (1979). A Non-Parametric Approach to the Change-Point Problem. Journal of the Royal Statistical Society, Series C (Applied...
    • Ross, G.,Tasoulis, D.K.,Adams, N.M.. (2011). Nonparametric Monitoring of Data Streams for Changes in Location and Scale. Technometrics. 53....
    • Salganicoff, M.. (1997). Tolerating Concept and Sampling Shift in Lazy Learning Using Prediction Error Context Switching. Artificial Intelligence...
    • Schlimmer, J.C.,Granger Jr., R.H.. (1986). Incremental learning from noisy data. Machine Learning. 1. 317-354
    • Scholz, M.,Klinkenberg, R.. (2007). Boosting classifiers for drifting concepts. Intelligent Data Analysis - Knowlegde Discovery from Data...
    • Yamanishi, K.,Takeuchi, J.I.. (2002). A Unifying Framework for Detecting Outliers and Change Points from Non-Stationary Time Series Data....
    • Zhou, C.,Zou, C.,Zhang, Y.,Wang, Z.. (2009). Nonparametric control chart based on change-point model. Statistical Papers. 50. 13-28
    • Žliobaitė, I.. (2009). Learning under Concept Drift: an Overview. Vilnius University.
Los metadatos del artículo han sido obtenidos de SciELO México

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno