Siham Tabik , Luis F. Romero, Emilio López Zapata
Complex iterative 3d stencils based on aseries of multiple simpler stencils with different computationintensities cannot be handled properly usingstandard techniques on the GPU. This work demonstratesthat decomposing these kind of stencils into asequence of up to a specific number of simpler stencilsand further optimizing each individual kernel providesthe best overall performance. We focus on the familyof PDE-based denoising methods, which can be reformulatedas sequence of multiple stencils-based tasks.The performance results and analysis show that thereexists an optimal level of splitting-coalescence of thesestencils-based tasks that reaches the best compromisebetween better use of fast-memories and higher concurrency.
© 2008-2024 Fundación Dialnet · Todos los derechos reservados