Ir al contenido

Documat


Mechanisms and techniques for scheduling in supercomputers

  • Autores: Jose Antonio Pascual Saiz
  • Directores de la Tesis: José Antonio Lozano Alonso (dir. tes.) Árbol académico, José Miguel Alonso (dir. tes.) Árbol académico
  • Lectura: En la Universidad del País Vasco - Euskal Herriko Unibertsitatea ( España ) en 2013
  • Idioma: español
  • Tribunal Calificador de la Tesis: Clemente Rodríguez Lafuente (presid.) Árbol académico, Alexander Mendiburu Alberro (secret.) Árbol académico, José Ángel Gregorio Monasterio (voc.) Árbol académico, Francisco Fernández de Vega (voc.) Árbol académico, Javier Navaridas Palma (voc.) Árbol académico
  • Texto completo no disponible (Saber más ...)
  • Resumen
    • This thesis analyzes the performance of the scheduling process in space-shared, large-scale supercomputers. These systems are specifically designed to run fine-grained parallel applications in which the communications/computation ratio is high. The way of using the interconnection network has a significant bearing on applications performance and, therefore, on the overall system performance. The scheduling process can be divided into three stages, driven by a set of policies or strategies. Assuming that users send parallel jobs to a single scheduling queue, (1) a job is selected to run, the (2) the resources (set of nodes) required by the job have to be located in the system and reserved for the job, and (3) job task have to be mapped onto the selected nodes. This dissertation studies ways of improving the performance of the scheduling process focusing on stages 2 (partitioning) and 3 (mapping). In particular we use contiguous partitioning as the strategy to assign partitions to jobs. Contiguous partitioning strategies have a well-known disadvantage: high fragmentation that results in low levels of system utilization. However they provide jobs with a running environment that, due to the locality of communications and the lack of interference with other running jobs, substantially reduce running times. In order to effectively exploit these advantages, an appropriate task-to-node mapping has to be implemented. Through extensive simulation-based experimentation, it is demonstrated that combinations of consecutive partitioning and application-aware mappings achieve excellent job throughput, compared to non-contiguous partitioning alternatives. Although the main topic of our work is contiguous partitioning, we have also explored some aspects of non-contiguous partitioning strategies, due to its common use in production environments. Regarding system topology, we mainly focus on cube-shaped topologies such as meshes and tori, but we also study alternative partitioning strategies for tree topologies.


Fundación Dialnet

Mi Documat

Opciones de tesis

Opciones de compartir

Opciones de entorno