Ir al contenido

Documat


Evaluation of unsupervised machine learning algorithms with climate data

  • Ramírez, Juan Sebastián [1] ; Duque, Néstor Darío [1]
    1. [1] Universidad Nacional de Colombia

      Universidad Nacional de Colombia

      Colombia

  • Localización: Ingeniería y desarrollo: revista de la División de Ingeniería de la Universidad del Norte, ISSN 0122-3461, Vol. 40, Nº. 2, 2022, págs. 131-165
  • Idioma: inglés
  • DOI: 10.14482/inde.40.02.622.553
  • Títulos paralelos:
    • Evaluación de algoritmos de aprendizaje de máquina no supervisados con datos climáticos
  • Enlaces
  • Resumen
    • español

      Al usar datos climáticos, los investigadores tienen dificultades para determinar el algoritmo de agrupamiento y los parámetros de mejor rendimiento para procesar un conjunto de datos específico.Se realiza la evaluación de algoritmos de aprendizaje automático no supervisados K-means, K-medoids y Linkage-complete, aplicados a tres conjuntos de datos con variables climatológicas (temperatura, lluvia, humedad relativa y radiación solar), para tres estaciones meteorológicas ubicadas en el departamento de Caldas, Colombia, a diferentes alturas sobre el nivel del mar. Se definen 5 escenarios para 2, 3 y 5 clústeres para cada uno de los dos algoritmos particionados y 5 escenarios para el algoritmo jerárquico, para cada una de las estaciones meteorológicas, y aplicando una cantidad y agrupación diferente de variables para los diferentes escenarios y utilizando la distancia euclidiana, Davis-Bouldin como método de evaluación de calidad de clústeres, normalización con técnicas como transformación de rango y transformación Z, varias iteraciones del algoritmo y reducción de dimensionalidad con PCA. Además, se evalúa el costo computacional. Esta investigación puede guiar al investigador sobre ciertas decisiones en el análisis de conglomerados utilizados en datos meteorológicos, así como identificar el algoritmo y los parámetros más importantes a considerar para el mejor desempeño, de acuerdo con las condiciones y requisitos particulares.

    • English

      When using climate data, researchers have difficulty determining the clustering algo-rithm and the best performing parameters for processing a specific dataset. We evaluated of the following unsupervised machine learning algorithms: K-means, K-medoids and Linkage-complete, which are applied to three datasets with clima-tological variables (temperature, rainfall, relative humidity, and solar radiation) for three meteorological stations located in the department of Caldas, Colombia, at different heights above sea level. Five scenarios are defined for 2, 3, and 5 clusters for each of the two partitioned algorithms, and five scenarios for the hierarchical algorithm, in each one of the meteorological stations. Different quantities and groupings of variables are applied for the different scenarios by using Euclidean distance. Davis-Bouldin is the applied method of quality evaluation of clusters. Normalization with techniques such as range-transformation and Z-transformation, as well as some iterations of the algorithm and reduction of dimensionality with PCA. In addition, the computatio-nal cost is evaluated. This study can guide researchers on certain decisions in cluster analysis used in meteorological data, as well as identify the most important algorithm and parameters to take into consideration for the best performance, according to particular conditions and requirements.

  • Referencias bibliográficas
    • Á. Arroyo, Á. Herrero, V. Tricio, and E. Corchado, “Analysis of meteorological conditions in Spain by means of clustering techniques,” J....
    • M. A. Asadi Zarch, B. Sivakumar, and A. Sharma, “Assessment of global aridity change,” J. Hydrol., vol. 520, pp. 300-313, 2015. Available:...
    • L. Carro-Calvo, C. Ordóñez, R. García-Herrera, and J. L. Schnell, “Spatial clustering and meteorological drivers of summer ozone in Europe,”...
    • M. J. Carvalho, P. Melo-Gonçalves, J. C. Teixeira, and A. Rocha, “Regionalization of Europe based on a K-Means Cluster Analysis of the climate...
    • J. Chen, M. Song, and L. Xu, “Evaluation of environmental efficiency in China using data envelopment analysis,” Ecol. Indic., vol. 52, pp....
    • L. Chen and G. Jia, “Environmental efficiency analysis of China’s regional industry?: a data envelopment analysis (DEA) based approach,” J....
    • R. Falquina and C. Gallardo, “Development and application of a technique for projecting novel and disappearing climates using cluster analysis,”...
    • A. M. Kalteh, P. Hjorth, and R. Berndtsson, “Review of the self-organizing map (SOM) approach in water resources: Analysis, modelling and...
    • S. C. Sheridan and C. C. Lee, “The self-organizing map in synoptic climatological research,” Prog. Phys. Geogr., vol. 35, no. 1, pp. 109-119,...
    • X. Wang et al., “A stepwise cluster analysis approach for downscaled climate projection - A Canadian case study,” Environ. Model. Softw.,...
    • Y. Zheng et al., “Vegetation response to climate conditions based on NDVI simulations using stepwise cluster analysis for the Three-River...
    • X. Zuo, H. Hua, Z. Dong, and C. Hao, “Environmental Performance Index at the Provincial Level for China 2006-2011,” Ecol. Indic., vol. 75,...
    • S. A. Cashman et al., “Mining Available Data from the United States Environmental Protection Agency to Support Rapid Life Cycle Inventory...
    • C. Gallo, N. Faccilongo, and P. La Sala, “Clustering analysis of environmental emissions: A study on Kyoto Protocol’s impact on member countries,”...
    • J. Jiang, B. Ye, D. Xie, and J. Tang, “Provincial-level carbon emission drivers and emission reduction strategies in China: Combining multi-layer...
    • I. Meghea, M. Mihai, I. Lacatusu, and I. Iosub, “Evaluation of Monitoring of Lead Emissions in Bucharest by Statistical Processing,” J. Environ....
    • N. Clay and B. King, “Smallholders uneven capacities to adapt to climate change amid Africa’s green revolution: Case study of Rwanda’s crop...
    • N. D. Abdul Halim et al., “The long-term assessment of air quality on an island in Malaysia,” Heliyon, vol. 4, no. 12, 2018. Available: https://doi.org/10.1016/j.heliyon.2018.e01054
    • T. Conradt, C. Gornott, and F. Wechsung, “Extending and improving regionalized winter wheat and silage maize yield regression models for Germany:...
    • S. Farah, D. Whaley, W. Saman, and J. Boland, “Integrating Climate Change into Meteorological Weather Data for Building Energy Simulation,”...
    • T. Soubdhan, M. Abadi, and R. Emilion, “Time dependent classification of solar radiation sequences using best information criterion,” Energy...
    • S. Khedairia and M. T. Khadir, “Impact of clustered meteorological parameters on air pollutants concentrations in the region of Annaba, Algeria,”...
    • T. Schneider, H. Hampel, P. V. Mosquera, W. Tylmann, and M. Grosjean, “Paleo-ENSO revisited: Ecuadorian Lake Pallcacocha does not reveal a...
    • F. Franceschi, M. Cobo, and M. Figueredo, “Discovering relationships and forecasting PM10 and PM2.5 concentrations in Bogotá Colombia, using...
    • A. K. Yadav, H. Malik, and S. S. Chandel, “Application of rapid miner in ANN based prediction of solar radiation for assessment of solar energy...
    • Y. Hao, L. Dong, X. Liao, J. Liang, L. Wang, and B. Wang, “A novel clustering algorithm based on mathematical morphology for wind power generation...
    • S. Han et al., “Quantitative evaluation method for the complementarity of wind-solar-hydro power and optimization of wind-solar ratio,” Appl....
    • M. André, R. Perez, T. Soubdhan, J. Schlemmer, R. Calif, and S. Monjoly, “Preliminary assessment of two spatio-temporal forecasting technics...
    • P. Lin, Z. Peng, Y. Lai, S. Cheng, Z. Chen, and L. Wu, “Short-term power prediction for photovoltaic power plants using a hybrid improved...
    • F. Mokdad and B. Haddad, “Improved infrared precipitation estimation approaches based on k-means clustering: Application to north Algeria...
    • S. Li, H. Ma, and W. Li, “Typical solar radiation year construction using k-means clustering and discrete-time Markov chain,” Appl. Energy,...
    • M. Ghayekhloo, M. Ghofrani, M. B. Menhaj, and R. Azimi, “A novel clustering approach for short-term solar radiation forecasting,” Sol. Energy,...
    • M. Bador, P. Naveau, E. Gilleland, M. Castellà, and T. Arivelo, “Spatial clustering of summer temperature maxima from the CNRM-CM5 climate...
    • L. Pokorná, M. Ku?erová, and R. Huth, “Annual cycle of temperature trends in Europe, 1961-2000,” Glob. Planet. Change, vol. 170, no. August,...
    • J. Parente, M. G. Pereira, and M. Tonini, “Space-time clustering analysis of wildfires: The influence of dataset characteristics, fire prevention...
    • M. I. Chidean, J. Muñoz-Bulnes, J. Ramiro-Bargueño, A. J. Caamaño, and S. Salcedo-Sanz, “Spatio-temporal trend analysis of air temperature...
    • M. I. Chidean, A. J. Caamaño, J. Ramiro-Bargueño, C. Casanova-Mateo, and S. Salcedo-Sanz, “Spatio-temporal analysis of wind resource in the...
    • Y. Zheng et al., “Assessment of global aridity change,” Ecol. Indic., vol. 75, no. September 2016, pp. 151-165, 2016. Available: https://doi.org/10.1016/j.scitotenv.2015.11.063
    • J. Ramirez, Juan; Duque, Nestor; Velez, “Normalización en desempeño de k-means sobre datos climáticos,” Vínculos, vol. 16, pp. 57-72, 2019....
    • D. G. de B. Franco and M. T. A. Steiner, “Clustering of solar energy facilities using a hybrid fuzzy c-means algorithm initialized by metaheuristics,”...
    • J. Hidalgo et al., “Comparison between local climate zones maps derived from administrative datasets and satellite observations,” Urban Clim.,...
    • C. C. Aggarwal and C. K. Reddy, DATA Custering Algorithms and Applications. CRC Press, 2013. Available: https://doi.org/10.1201/9781315373515
    • G. Gan, C. Ma, and J. Wu, Data Clustering: Theory, Algorithms, and Applications. Philadelphia, Pennsylvania: SIAM - Society for Industrial...
    • T. T. Nguyen, A. Kawamura, T. N. Tong, N. Nakagawa, H. Amaguchi, and R. Gilbuena, “Clustering spatio-seasonal hydrogeochemical data using...
    • H. Yahyaoui and H. S. Own, “Unsupervised clustering of service performance behaviors,” Inf. Sci. (Ny)., vol. 422, pp. 558-571, 2018. Available:...
    • A. Lausch, A. Schmidt, and L. Tischendorf, “Data mining and linked open data - New perspectives for data analysis in environmental research,”...
    • A. Naik and L. Samant, “Correlation Review of Classification Algorithm Using Data Mining Tool: WEKA, Rapidminer, Tanagra, Orange and Knime,”...
    • V. Obradovic, D. Bjelica, D. Petrovic, M. Mihic, and M. Todorovic, “Whether We are Still Immature to Assess the Environmental KPIs!,” Procedia...
    • K. Pitchayadejanant and P. Nakpathom, “Data mining approach for arranging and clustering the agro-tourism activities in orchard,” Kasetsart...
    • S. S. Shaukat, T. A. Rao, and M. A. Khan, “Impact of sample size on principal component analysis ordination of an environmental data set:...
    • N. Erman and J. Suklan, “Performance of selected agglomerative clustering methods,” Innov. Issues Approaches Soc. Sci., vol. 8, no. January,...
    • J. Ramírez, “Evaluación de algoritmos de aprendizaje de máquina no supervisados sobre datos climáticos”. Universidad Nacional de Colombia...

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno