This PhD thesis explores the integration of machine learning operations (MLOps) into remote sensing, addressing the challenges posed by the continuous growth of open satellite data from initiatives like Copernicus and Landsat. The research focuses on developing scalable, efficient, and reproducible methodologies to process high-resolution satellite data and operationalize advanced environmental monitoring workflows, with the ultimate goal of enabling any user to generate land cover classification maps at any scale, abstracting the methodology and emphasizing fieldwork.
The first contribution is the design of a scalable methodology for land cover classification using satellite data, which prioritizes modularity, efficiency, and distributability. This methodology processes multispectral data from Sentinel-2 and a digital elevation model from NASA's Terra satellite, enabling high-resolution land cover classification over large regions. Its implementation is demonstrated in a Mediterranean Basin case study covering over 4 million km2 with pixels of 100 m2, showcasing its scalability and environmental value. This methodology culminated in the development of the open-source Python package, LandCoverPy, which facilitates its application in diverse contexts.
The second contribution is the development of an open-source MLOps framework capable of achieving the same functionality as solutions from companies like Amazon SageMaker or ClearML. Built on Kubernetes and Python, this framework integrates state-of-the-art tools such as MLflow, Seldon Core, Prometheus, and Kafka to automate the lifecycle of machine learning models, from deployment to monitoring and retraining. The deployment of this framework allows any user to automatically manage the lifecycle of their models with minimal configuration.
The third contribution involves the integration of the advanced land cover classification methodology into the MLOps framework. This integration enables the maintenance of multiple models simultaneously (e.g., annual models), as well as their monitoring and retraining. Additionally, a Graphical User Interface has been developed that provides an intuitive, map-based interface for making predictions, selecting areas of interest, and visualizing results. This democratization of access empowers researchers and organizations to conduct large-scale environmental studies efficiently and effectively.
Through the contributions made in this thesis, an open-source toolkit is provided, enabling any researcher to create high-quality maps at any scale. The developed methodologies and tools facilitate the development of studies addressing pressing global challenges, including climate change, land-use planning, and biodiversity conservation.
Future work includes integrating Light Detection and Ranging data for enhanced classification and exploring advanced machine learning architectures to further optimize processing capabilities. Furthermore, a cloud-based service is proposed, leveraging the MLOps framework and land cover methodology to allow users to use their own data for fine-tuning and creating custom maps without requiring dedicated infrastructure.
© 2008-2025 Fundación Dialnet · Todos los derechos reservados