Resumen de Performance Analysis of the Multi-pass Transformation for Complex 3d-Stencils on GPUs

Siham Tabik , Luis F. Romero, Emilio López Zapata

Complex iterative 3d stencils based on aseries of multiple simpler stencils with different computationintensities cannot be handled properly usingstandard techniques on the GPU. This work demonstratesthat decomposing these kind of stencils into asequence of up to a specific number of simpler stencilsand further optimizing each individual kernel providesthe best overall performance. We focus on the familyof PDE-based denoising methods, which can be reformulatedas sequence of multiple stencils-based tasks.The performance results and analysis show that thereexists an optimal level of splitting-coalescence of thesestencils-based tasks that reaches the best compromisebetween better use of fast-memories and higher concurrency.

Acceso de usuarios registrados

¿Es nuevo? Regístrese

Coordinado por: