Ir al contenido

Documat


Resumen de New cross-layer techniques for multi-criteria scheduling in large-scale systems

Alberto Cascajo García

  • The global ecosystem of information technology (IT) is in transition to a new generation of applications that require more and more intensive data acquisition, processing, and storage systems. As a result of that change towards data-intensive computing, there is a growing overlap between high-performance computing (HPC) and Big Data techniques in applications, since many HPC applications produce large volumes of data, and Big Data needs HPC capabilities.

    The hypothesis of this PhD. thesis is that the potential interoperability and convergence of the HPC and Big Data systems are crucial for the future, being essential the unification of both paradigms to address a broad spectrum of research domains. For this reason, the main objective of this PhD. thesis is purposing and developing a monitoring system to allow the HPC and Big Data convergence, thanks to giving information about behaviors of applications in a system which execute both kinds of them, giving information to improve scalability, data locality, and to allow adaptability to large scale computers. To achieve this goal, this work is focused on the design of resource monitoring and discovery to exploit parallelism at all levels. These collected data are disseminated to facilitate global improvements at the whole system, and, thus, avoid mismatches between layers. The result is a two-level monitoring framework (both at node and application-level) with a low computational load, scalable, and that can communicate with different modules thanks to an API provided for this purpose. All data collected is disseminated to facilitate the implementation of improvements globally throughout the system, and thus avoid mismatches between layers, which combined with the techniques applied to deal with fault tolerance, makes the system robust and with high availability.

    On the other hand, the developed framework includes a task scheduler capable of managing the launch of applications, their migration between nodes, as well as the possibility of dynamically increasing or decreasing the number of processes. All these thanks to the cooperation with other modules that are integrated into \texttt{LIMITLESS}, and whose objective is to optimize the execution of a stack of applications based on multi-criteria policies. This scheduling mode is called \emph{coarse-grain scheduling based on monitoring}.

    For better performance and in order to further reduce the overhead during the monitorization, different optimizations have been applied at different levels to try to reduce communications between components, while trying to avoid the loss of information. To achieve this objective, data filtering techniques, Machine Learning (ML) algorithms, and Neural Networks (NN) have been used.

    In order to improve the scheduling process and to design new multi-criteria scheduling policies, the monitoring information has been combined with other ML algorithms to identify (through classification algorithms) the applications and their execution phases, doing offline profiling. Thanks to this feature, \texttt{LIMITLESS} can detect which phase is executing an application and it tries to share the computational resources with other applications that are compatible (there is no performance degradation between them when both are running at the same time). This feature is called \emph{fine-grain scheduling} and can reduce the makespan of the use cases while makes efficient use of the computational resources that other applications do not use.


Fundación Dialnet

Mi Documat