Ir al contenido

Documat


Resumen de Performance Analysis of the Multi-pass Transformation for Complex 3d-Stencils on GPUs

Siham Tabik Árbol académico, Luis F. Romero, Emilio López Zapata Árbol académico

  • Complex iterative 3d stencils based on aseries of multiple simpler stencils with different computationintensities cannot be handled properly usingstandard techniques on the GPU. This work demonstratesthat decomposing these kind of stencils into asequence of up to a specific number of simpler stencilsand further optimizing each individual kernel providesthe best overall performance. We focus on the familyof PDE-based denoising methods, which can be reformulatedas sequence of multiple stencils-based tasks.The performance results and analysis show that thereexists an optimal level of splitting-coalescence of thesestencils-based tasks that reaches the best compromisebetween better use of fast-memories and higher concurrency.


Fundación Dialnet

Mi Documat