Automated machine learning to support model selection in supervised traffic forecasting

Juan S. Angarita Zapata

Ayuda

Automated machine learning to support model selection in supervised traffic forecasting

Autores: Juan S. Angarita Zapata
Directores de la Tesis: Antonio David Masegosa Arredondo (dir. tes.) , Isaac Triguero Velázquez (dir. tes.)
Lectura: En la Universidad de Deusto ( España ) en 2020
Idioma: inglés
Tribunal Calificador de la Tesis: David Alejandro Pelta Mochcovsky (presid.) , Enrique Onieva Caracuel (secret.) , Alberto Cano (voc.)
Enlaces
- Tesis en acceso abierto en: hdl.handle.net
Resumen
- Intelligent Transportation Systems announce the production of tons of hardly manageable traffic data that motivate the use of data-driven approaches, with a particular interest in Machine Learning (ML), to analyzing this data. ITS data can be used by different applications such as Traffic Forecasting (TF) schemes. Recently, TF is gaining relevance due to its ability to deal with traffic congestion through forecasting future states of different traffic measures (e.g. travel time). TF poses two main challenges to the ML paradigm. First, traffic data can be collected in multiple formats (e.g. traffic-counting measures, GPS tracks) and under different transportation circumstances (e.g. urban, freeway). These characteristics influence the performance of ML methods, and choosing the most competitive method from a set of candidates brings human effort and time costs. Second, raw traffic data usually needs to be preprocessed before being analyzed. Hence, deciding the most suitable combination of data preprocessing techniques and ML method is a time-consuming task that demands specialized ML knowledge to approach it.
  
  Automated Machine Learning (AutoML) arises as a promising approach that addresses the issues mentioned above in problem domains wherein expert ML knowledge is not always an available or affordable asset such as TF. AutoML methods have been broadly used in other areas; however, it has been underexplored in TF. The latter raises the question if general-purpose AutoML guarantees competitive results while reducing the human-time costs of ML in TF. However, current AutoML approaches suffer from issues that can also affect its performance in TF as well as in other ML problems. The optimization process to find competitive pipelines is complicated and computational costly because of the diversity of the search space and the high evaluation cost of the objective function. Alternative learning approaches (e.g. meta-learning) have been designed to try to overcome these issues, but they could not properly work on diverse datasets such as TF. Therefore, this thesis focuses on the development of new AutoML approaches more suited to specific problem domains that can also offer competitive results in TF.
  
  We present a new AutoML method for supervised problems, such as TF, with a search strategy based on the construction of ensembles from a portfolio of multiple classifiers. This AutoML mechanism can better adapt to specific problem domains using data preprocessing techniques, ML methods and raw data. The proposed method can lead to better or competitive results in the general-purpose field and TF with respect to the state-of-the-art. This is accomplished by taking advantage of the automated generation of ensembles from a predefined set of ML pipelines. The use of these multiple classifier systems significantly speed up the AutoML process, and it also opens the path towards AutoML frameworks based on ensemble strategies.