Revisiting Strassen’s Matrix Multiplication for Multicore Systems

Domingo Giménez Cánovas

Ayuda

Revisiting Strassen’s Matrix Multiplication for Multicore Systems

Giménez Cánovas, Domingo ^[1]
1. [1] Universidad de Murcia
  
  Universidad de Murcia
  
  Murcia, España
Localización: Annals of Multicore and GPU Programming: AMGP, ISSN 2341-3158, Vol. 4, Nº. 1, 2017, págs. 1-8
Idioma: inglés
Enlaces
- Texto completo (pdf)
Resumen
- Strassen’s matrix multiplication reduces the computational cost of multiplying matrices of size n × n from the O(n3) cost of a typical three-loop implementation to approximately O(n2.81). The reduction is made at the expense of additional operations of cost O(n2), and additional memory is needed for temporary results of recursive calls. The advantage of Strassen’s algorithm is therefore only apparent for larger matrices and it requires careful implementation. The increase in the speed of computational systems with several cores which share a common memory space also makes it more difficult for the algorithm to compete against highly optimized three-loop multiplications. This paper discusses various aspects which need to be addressed when designing Strassen multiplications for today’s multicore systems.
Referencias bibliográficas
- A. Abdelfattah, A. Haidar, S. Tomov, and J. J. Dongarra. Performance, design, and autotuning of batched GEMM for GPUs. In Proceedings of 31st...
- E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, H. Ltaief, P. Luszczek, and S. Tomov. Numerical linear algebra on emerging...
- P. Alberti, P. Alonso, A. M. Vidal, J. Cuenca, and D. Giménez. Designing polylibraries to speed up linear algebra computations. International...
- E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. J. Dongarra, J. D. Croz, A. Grenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen....
- G. Bernabé, J. Cuenca, L. García, and D. Giménez. Auto-tuning techniques for linear algebra routines on hybrid platforms. J.Comput. Science,...
- G. Brassard and P. Bratley. Fundamentals of Algorithms. Prentice-Hall, 1996.
- J. Cámara, J. Cuenca, L. García, and D. Giménez. Auto-tuned nested parallelism: A way to reduce the execution time of scientific software...
- J. Cámara, J. Cuenca, D. Giménez, L. García, and A. M. Vidal. Empirical installation of linear algebra shared-memory subroutines for auto-tuning....
- T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. The MIT Press, 1990.
- J. J. Dongarra, S. Hammarling, N. J. Higham, S. D. Relton, and M. Zounon. Optimized batched linear algebra for modern architectures. In Proceedings...
- G. Golub and C. F. V. Loan. Matrix Computations. The John Hopkins University Press, fourth edition, 2013.
- J. Huang, T. M. Smith, G. M. Henry, and R. A. van de Geijn. Strassen’s algorithm reloaded. In Proceedings of the International Conference...
- S. Hunold and T. Rauber. Automatic tuning of PDGEMM towards optimal performance. In 11th International Euro-Par Conference, Lecture Notes...
- Intel MKL web page. http://software.intel.com/en-us/intel-mkl/.
- J. Kurzak, H. Ltaief, J. Dongarra, and R. M. Badia. Scheduling dense linear algebra operations on multicore processors. Concurrency and Computation:...
- T. Sakurai, T. Katagiri, H. Kuroda, K. Naono, M. Igai, and S. Ohshima. A sparse matrix library with automatic selection of iterative solvers...
- V. Strassen. Gaussian elimination is not optimal. Numerische Mathematik, 3(14):354–356, 1969.