Técnicas de optimización de rutinas paralelas de álgebra lineal en sistemas heterogéneos

Jesús Cámara Moreno

Ayuda

Técnicas de optimización de rutinas paralelas de álgebra lineal en sistemas heterogéneos

Autores: Jesús Cámara Moreno
Directores de la Tesis: Domingo Giménez Cánovas (dir. tes.) , Antonio Javier Cuenca Muñoz (dir. tes.)
Lectura: En la Universidad de Murcia ( España ) en 2020
Idioma: español
Número de páginas: 250
Tribunal Calificador de la Tesis: Leonel Augusto Pires Seabra De Sousa (presid.) , Pedro Enrique López de Teruel Alcolea (secret.) , Victor Manuel García Molla (voc.)
Enlaces
- Tesis en acceso abierto en: DIGITUM
Resumen
- español
  El objetivo principal de esta Tesis es el desarrollo de una metodología de auto-optimización jerárquica para rutinas de álgebra lineal en entornos computacionales heterogéneos. Esta metodología ha sido diseñada de forma que permita la obtención de rutinas que sean ejecutadas de forma eficiente desde el nivel más bajo de la jerarquía, tanto hardware como software, hasta llegar al nivel más alto en cada caso.
  
  En lo referente al hardware, los componentes que se consideran a más bajo nivel son los elementos computacionales básicos: CPU multicore, GPU y MIC (Xeon Phi).
  
  Los parámetros a tener en cuenta para optimizar las rutinas son distintos en cada uno de ellos, pero la metodología debe ser válida para todos. Estos elementos computacionales, a su vez, pueden agruparse para formar otros de mayor nivel (nodo).
  
  Generalmente, hoy en día un nodo computacional estándar suele estar formado por una CPU multicore y uno o varios coprocesadores, que en algunos casos son de distinto tipo (GPU y/o MIC) e incluso pueden tener diferente capacidad computacional aunque sean del mismo tipo. La metodología, por tanto, debe considerar cómo combinar la información de auto-optimización a nivel de elemento computacional básico, para guiar la optimización a nivel de nodo. Los nodos, a su vez, se pueden agrupar en clusters, homogéneos o heterogéneos, por lo que la metodología jerárquica ha de ser capaz, a su vez, de obtener la información de optimización a nivel de nodo para guiar la optimización de las rutinas a nivel de cluster.
  
  En cuanto al software (rutinas de álgebra lineal), se comienza por el nivel correspondiente a las rutinas básicas, principalmente la multiplicación de matrices, pues constituye el componente computacional básico sobre el que se implementan otras rutinas eficientes de álgebra lineal de nivel superior en los entornos computacionales actuales. A continuación, la metodología se extiende a rutinas de mayor nivel, como la multiplicación de Strassen o la factorización LU, las cuales utilizarán internamente la rutina de multiplicación de matrices previamente optimizada. Del mismo modo, en un siguiente nivel de la jerarquía se podrá considerar la ejecución de códigos científicos que realicen llamadas a rutinas del nivel inferior, como pueden ser simulaciones en las que se lleve a cabo la resolución simultánea de varios sistemas de ecuaciones. Además, dado que existen multitud de librería de álgebra lineal optimizadas y, en algunos casos, paralelizadas para distintos sistemas, se hace uso de las rutinas que contienen y se analiza la aplicación de la metodología de auto-optimización jerárquica considerando diferentes librerías. De esta manera, se puede mostrar su validez independientemente de las rutinas y librerías básicas utilizadas.
  
  El proceso de auto-optimización tiene como principal objetivo obtener rutinas eficientes en los sistemas computacionales para los que se diseñan, para lo cual es necesario seleccionar los valores a utilizar para cada uno de los parámetros del conjunto de parámetros ajustables que lleven a la obtención de tiempos de ejecución cercanos al óptimo experimental. La elección del conjunto de parámetros depende tanto de la rutina como del entorno computacional. Por tanto, la metodología de auto-optimización debe garantizar que es válida para múltiples rutinas, librerías y sistemas computacionales, de forma que sea fácilmente extensible en cualquiera de estas dimensiones.
  
  Para alcanzar el objetivo principal establecido, la metodología de trabajo utilizada comenzará analizando el comportamiento de la metodología de auto-optimización jerárquica realizando un estudio por niveles (tanto hardware como software), considerando diferentes escenarios (sistemas y rutinas) y parámetros en cada nivel hasta llegar al mayor nivel considerado en cada caso, de forma que las conclusiones sean generales y no dependan de la rutina o de un sistema concreto.
  
  Los experimentos se llevarán a cabo sobre plataformas heterogéneas compuestas por nodos con distinto número y tipo de unidades de procesamiento paralelo (CPU, GPU, MIC). Se considerarán varias configuraciones de unidades de cómputo, por ejemplo, multicores con distinto número de cores o agrupaciones de nodos a nivel de cluster, con el fin de obtener diferentes grados de heterogeneidad. Una vez analizada la metodología en el primer nivel hardware, se pasará al siguiente nivel de elementos de computación, analizando qué información del nivel anterior se puede utilizar para acelerar la toma de decisiones en el nivel actual manteniendo, al mismo tiempo, la calidad del proceso de optimización. Del mismo modo, tras analizar las rutinas de un nivel, se pasará al siguiente nivel de la jerarquía software, recorriendo de nuevo, de forma incremental, los niveles de la jerarquía hardware. Así, mientras que en el nivel básico de las rutinas puede ser suficiente trabajar con la multiplicación de matrices (al ser el núcleo computacional en el que se basan rutinas de mayor nivel), en el siguiente nivel se podrán considerar rutinas que invoquen a la multiplicación de matrices y hagan uso de parámetros de distinto tipo, como el nivel de recursión en la multiplicación de Strassen o el tamaño de bloque en las factorizaciones matriciales.
  
  La labor de investigación realizada en esta Tesis ha culminado con la publicación de un artículo en una revista de carácter científico e internacional. En este artículo se abordan los contenidos del núcleo de la Tesis y se muestran los principales resultados de investigación obtenidos con la metodología de auto-optimización jerárquica. Los resultados son satisfactorios y muestran la efectividad de aplicar la metodología siguiendo un enfoque jerárquico en los niveles hardware y software. De esta forma se constata, por un lado, el interés científico que tiene el trabajo realizado y, por otro, la validez de la propia metodología, dada su capacidad para adaptarse a cualquier tipo de sistema computacional y de ser extendida con otras rutinas, librerías y técnicas de selección de los valores de los parámetros (algorítmicos y del sistema).
  
  El objetivo es seguir avanzando en la misma línea, extendiendo la metodología con nueva funcionalidad y considerando su inclusión en software de mayor nivel que haga uso de ella para conseguir una ejecución optimizada en la resolución de problemas científicos y de ingeniería.
- English
  The aim of this Thesis is the development of a hierarchical auto-tuning methodology for linear algebra routines in heterogeneous computational environments. This methodology has been designed to allow obtaining routines executed eficiently from the lowest level of the hierarchy, both hardware and software, to the highest level in each case.
  
  In terms of hardware, the components considered at the lowest level are the basic computational elements: multicore CPU, GPU and MIC (Xeon Phi). The parameters to take into account to optimize the routines are diferent in each of them, but the methodology must be valid in all cases. These computational elements can be grouped into higher level components (nodes), which generally consists of a multicore CPU and one or more coprocessor, in some cases of a diferent type (GPU or MIC) or with diferent computational power even if they are of the same type. Therefore, the methodology must consider how to combine the auto-tuning information from the basic computational elements to guide the optimization at node level. The nodes, in turn, can be grouped into clusters, either homogeneous or heterogeneous, so the hierarchical methodology must be able to obtain the optimization information at node level to guide the optimization routines at cluster level.
  
  As for the software (linear algebra routines), the process starts with basic routines, mainly the matrix multiplication, since it is the basic computational component used for the implementation of other eficient higher-level linear algebra routines. The methodology is then extended to higher level routines, such as the Strassen multiplication or the LU factorization, which will use the previously optimized matrix multiplication routine. Similarly, in the next level of the hierarchy, it could be considered the execution of scientific codes that call lower level routines, such as simulations in which the resolution of several systems of equations is performed simultaneously. In addition, since there are multiple optimized linear algebra libraries and, in some cases, parallelized for diferent systems, these routines are used and the application of the hierarchical auto-tuning methodology is analyzed considering diferent libraries. In this way, the efectiveness of the methodology can be shown independently of the routines and libraries used.
  
  The main goal of the auto-tuning process is to obtain eficient routines in the computational systems for which they are designed. Therefore, it is necessary to select the values of the parameters of a set of adjustable parameters that lead to execution times close to the experimental optima. The choice of the set of parameters depend on both the routine and the computational system. Thus, the auto-tuning methodology must ensure its validity for multiple routines, libraries and computing systems, so that it can be easily extended in any dimension.
  
  To achieve the main goal established, the working methodology used will begin by analysing the behaviour of the hierarchical auto-tuning methodology by performing a study by levels (both hardware and software), considering diferent scenarios (systems and routines) and parameters at each level until reaching the highest level considered in each case, so that the conclusions are general and do not depend on the routine or a specific system.
  
  The experiments will be carried out on heterogeneous platforms made up of nodes with diferent number and type of parallel processing elements (CPU, GPU, MIC). Several configurations of computing units will be considered, e.g. multicores with diferent number of cores or groups of nodes at cluster level, in order to obtain diferent degrees of heterogeneity. Once the methodology is analyzed at the first level of hardware, the process moves to the next level of computing elements, analyzing the information from the previous level that can be used to accelerate the decisión taken at the current level while maintaining the quality of the optimization process.
  
  Similarly, after analyzing the routines at one level, the process will continue to the next level of the software hierarchy, moving through the levels of the hardware hierarchy incrementally. Thus, while at the level of basic routines it is enough to work with the matrix multiplication, in the next level it could consider routines that internally call that routine and use diferent parameters, such as the recursion level in the Strassen multiplication or the block size in matrix factorizations, such as LU.
  
  The research work carried out in this Thesis has culminated with the publication of an article in an international journal. This article describes the main ideas developed in the Thesis and shows the main results obtained with the proposed hierarchical auto-tuning methodology. The results are satisfactory and show the efectiveness of applying a hierarchical approach both in the hardware and software levels. In this way, the validity of the proposed methodology is confirmed, due to its capacity to be adapted to any type of computing system and to be extended with other routines, libraries and techniques for the selection of the values of the algorithmic and system parameters. The goal is to continue advancing in the same direction, extending the methodology with new features and considering its inclusión within higher level software that can make use of it to achieve an optimized execution. It is expected that these advances allow the eficient application of the hierarchical auto-tuning methodology in solving scientific and engineering problems.