An MPI-CUDA library for image processing on HPC architectures

Antonella Galizia; Daniele D'Agostino; Andrea Clematis

Ayuda

An MPI-CUDA library for image processing on HPC architectures

Autores: Antonella Galizia, Daniele D'Agostino, Andrea Clematis
Localización: Journal of computational and applied mathematics, ISSN 0377-0427, Vol. 273, Nº 1, 2015, págs. 414-427
Idioma: inglés
DOI: 10.1016/j.cam.2014.05.004
Texto completo no disponible (Saber más ...)
Resumen
- Scientific image processing is a topic of interest for a broad scientific community since it is a mean of gaining understanding and insight into the data for a growing number of applications. Furthermore, the technological evolution permits large data acquisition, with sophisticated instruments, and their elaboration through complex multidisciplinary applications, resulting in datasets that are growing at an extremely rapid pace. This results in the need of huge computational power for the processing. It is necessary to move towards High Performance Computing (HPC) and to develop proper parallel implementations of image processing algorithms/operations. Modern HPC resources are typically highly heterogeneous systems, composed of multiple CPUs and accelerators such as Graphics Processing Units (GPUs) and Field-Programmable Gate Arrays (FPGAs). The actual barrier posed by heterogeneous HPC resources is the development and/or the performance efficient porting of software on such complex architectures. In this context, the aim of this work is to enable image processing on cluster of GPUs, through the use of PIMA(GE)2 Lib, the Parallel IMAGE processing GEnoa Library. The library is able to exploit traditional clusters through MPI, GPU device through CUDA and a first experimentation is aimed to explore the use of GPU-clusters. Library operations are provided to the users through a sequential interface defined to hide the parallelism of the computation. The parallel computation, at each level, is managed employing specific policies designed to suitably coordinate the parallel processes/threads involved in the elaboration and their use is tightly coupled with the PIMA(GE)2 Lib interface. In this paper, we present the incremental approach adopted in the development of the library and the performance gains in each implementations: quite linear speedup is achieved on cluster architecture, about a 30% improvement in the execution time on a single GPU and the first results on cluster of GPUs are promising.