HPC en simulación y control a gran escala

Peter Benner; Pablo Ezzatti; Hermann Mena; Enrique Salvador Quintana Ortí; Alfredo Remón Gómez

Ayuda

HPC en simulación y control a gran escala

Benner, Peter ^[1] ; Ezzatti, Pablo ^[2] ; Mena, Hermann ^[3] ; Quintana–Ortí, Enrique S. ^[4] ; Remón, Alfredo ^[4]
1. [1] Max Planck Institute for Dynamics of Complex Technical Systems
  
  Max Planck Institute for Dynamics of Complex Technical Systems
  
  Landeshauptstadt Magdeburg, Alemania
2. [2] Universidad de la República
  
  Universidad de la República
  
  Uruguay
3. [3] University of Innsbruck
  
  University of Innsbruck
  
  Innsbruck, Austria
4. [4] Universitat Jaume I
  
  Universitat Jaume I
  
  Castellón, España
Mostrar afiliaciones +
Localización: Elementos, ISSN-e 2248-5252, Vol. 3, Nº. 3, 2013 (Ejemplar dedicado a: Elementos), págs. 9-35
Idioma: español
DOI: 10.15765/e.v3i3.412
Títulos paralelos:
- HPC in simulation and large scale control
Enlaces
- Texto completo (pdf)
Resumen
- español
  La simulación y control de fenómenos que aparecen en microelectrónica, micro-mecánica, electromagnetismo, dinámica de ﬂuidos y en general en muchos procesos industriales, constituye un problema difícil de resolver, debido principalmente al elevado costo computacional de los algoritmos para este propósito. Gran parte de los modelos matemáticos que describen estos fenómenos poseen dimensión grande; por ejemplo, la modelización de microprocesadores desemboca en un sistema dinámico a gran escala que no puede ser resuelto con métodos numéricos tradicionales.En su defecto, son necesarias e incluso obligatorias varias técnicas computacionales de alto desempeño (high performance computing, HPC) para enfrentar este tipo de problemas. En el presente artículo revisamos herramientas de HPC que permiten simular y controlar problemas a gran escala. Concretamente, nos centramos en técnicas para la reducción de modelos vía truncamiento balanceado y la resolución de problemas de control lineal cuadrático, que pueden ser implementadas eﬁcientemente en plataformas multi-núcleo con memoria compartida que, además, utilizan uno o más procesadores gráﬁcos (GPUs).
- English
  The simulation and control of phenomena arising in microelectronics, micromechanics, electromagnetism, *uid dynamics and in general in many industrial processes, is a very challenging task; mainly because they require a high omputational cost. Most of the mathematical models describing these henomena have a large dimension, e.g., the simulation of microprocessors, leads to a large scale dynamical system which can not be solved using conventional methods. Instead, high performance computing HPC techniques have to be applied to deal with these problems. In this paper we review modern tools from HPC which allow us to solve large scale problems. Speciﬁcally, we focus on model reduction techniques viabalanced truncation and the solution of linear quadratic control problems that can be eﬃciently implemented on multi-core platforms equipped with one or more graphics processors (GPUs).
Referencias bibliográficas
- Repositorio Netlib. www.netlib.org/. Consultado en octubre (2011)
- Sitio Web oﬁcial de la biblioteca SLICOT www.slicot.org/
- Alfaro, P., Igounet, P, and Ezzatti, P.: Resolución de matrices tri-diagonales utili zando una tarjeta gráﬁca (GPU) de escritorio. Mecánica...
- Antoulas A.C.: Lectures on the approximation of linear dynamical systems. Encyclopedia of Electrical and Electronics Engineering. John Wiley...
- Antoulas, A. C., Sorensen, D. C., and Gugercin, S.: A survey of model reduction methods for large-scale systems. Contemporary Mathematics,...
- Baboulin, M., Dongarra, J. and Tomov, S.: Some Issues in Dense Linear Algebra for Multicore and Special Purpose Architectures. Manchester...
- Bajaj, C., Ihm, I., and Min, J. and Oh, J.: SIMD Optimization of Linear Expressions for Programmable Graphics Hardware. Computer Graphics...
- Barrachina, S., Castillo, M., Igual, F. D., Mayo, R., Quintana-Ortí, E. S.: Solving Dense Linear Systems on Graphics Processors. in Euro-Par...
- Barrachina, S., Castillo, M., Igual, F. D., Mayo, R., Quintana-Ortí, E. S., QuintanaOrtí, G.: Evaluation and Tuning of the Level 3 CUBLAS...
- Departamento de Ingeniería y Ciencia de Computadores, Universidad Jaime I, Campus de Riu Sec, s/n 12.071 - Castellón, España, (2008)
- Barrachina, S., Castillo, M., Igual, F. D., Mayo R., Quintana-Ortí, E. S., QuintanaOrtí, G.: Exploiting the capabilities of modern GPUs for...
- Barrett, R., Berry, M., Chan, T. F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., Van der Vorst, H.: Templates...
- Baskaran, M., Bordawekar, R.: Optimizing sparse matrix-vector multiplication on GPUs, IBM Research Report 24704 (2009).
- Bell, N., Garland, M. Implementing sparse matrix-vector multiplication on throughput-oriented processors. Proceedings of the Conference on...
- Benner, P.: Solving large-scale control problems. IEEE Control Systems Magazine, 14(1) (2004) 44–59
- Benner, P.: System-theoretic methods for model reduction of large-scale systems: Simulation, control, and inverse problems. Proceedings of...
- February 11-13, 2009, I. Troch and F. Breitenecker, eds., vol. 35 of ARGESIM Reports, (2009) 126–145
- Benner, P., Ezzatti, P., Kressner, D., Quintana-Ortí, E. S., Remón, A.: Accelerating model reduction of large linear systems with graphics...
- Benner, P., Ezzatti, P., Kressner, D., Quintana-Ortí, E. S. , Remón, A.: A mixedprecision algorithm for the solution of Lyapunov equations...
- Benner, P., Ezzatti, P., Mena, H., Quintana-Ortí, E. S. , Remón, A.: Solving diﬀerential Riccati equations on multi-GPU platforms. In 2nd...
- Benner, P., Ezzatti, P., Mena, H., Quintana-Ortí, E. S. , Remón, A.: Solving diﬀerential
- Riccati equations on multi-GPU platforms. In 10th International Conference on
- Computational and Mathematical Methods in Science and Engineering CMMSE11,
- (2011) 178–188
- Benner, P., Ezzatti, P., Kressner, D., Quintana-Ortí, E. S. , Remón, A.: Using hybrid
- CPU-GPU platforms to accelerate the computation of the matrix sign function. In
- Euro-Par Workshops, H.-X. Lin, M. Alexander, M. Forsell, A. Knüpfer, R. Prodan,
- L. Sousa, and A. Streit, eds., vol. 6043 of Lecture Notes in Computer Science,
- Springer, (2009) 132–139
- Benner, P., Ezzatti, P., Kressner ,D., Quintana-Ortí, E. S. , Remón, A.: Accelerating
- BST methods for model reduction with graphics processors. In Proceedings of the
- th International Conference on Parallel Processing and Applied Mathematics,
- (2011)
- Benner, P., Ezzatti, P., Kressner, D., Quintana-Ortí, E. S. , Remón, A.: Hing
- performance matrix inversion of SPD matrices on graphics processors. In Workshop
- on Exploitation of Hardware Accelerators WEHA 2011, (2011) 640–646
- Benner, P., Hinze, M., Ter Maten, J.: Model Reduction for Circuit Simulation. Vol.
- of Lecture Notes in Electrical Engineering, Springer-Verlag, Berlin/Heidelberg,
- Germany, (2011)
- Benner, P., Li, J.-R., Penzl, T.: Numerical solution of large Lyapunov equations,
- Riccati equations, and linear-quadratic control problems. Numer. Linear Algebra
- Appl., 15 (2008) 755–777
- Benner, P., Mayo, R., Quintana-Ortí E. S., Quintana-Ortí, G.: Enhanced services for
- remote model reduction of large-scale dense linear systems. In PARA,J. Fagerholm,
- J. Haataja, J. Järvinen, M. Lyly, P. Raback , and V. Savolainen, eds., vol. 2367 of
- Lecture Notes in Computer Science, Springer, (2002) 329–338
- Benner, P., Mehrmann, V., Sima, V., Huﬀel, S. V., Varga, A.: SLICOT -a subroutine
- library in systems and control theory. Applied and Computational Control, Signals,
- and Circuits, Birkhuser, (1997) 499–539
- Benner, P., Mehrmann, V., Sorensen, D.: Dimension Reduction of Large-Scale
- Systems. Vol. 45 of Lecture Notes in Computational Science and Engineering.
- Springer-Verlag, Berlin/Heidelberg, Germany, (2005)
- Benner, P., Mena, H.: BDF methods for large-scale diﬀerential Riccati equations. In
- Proc. of Mathematical Theory of Network and Systems, MTNS 2004, B. D. Moor,
- B. Motmans, J. Willems, P. V. Dooren, and V. Blondel, eds., (2004)
- Benner, P., Quintana-Ortí E. S., Quintana-Ortí, G.: A portable subroutine library
- for solving linear control problems on distributed memory computers. In Workshop
- on Wide Area Networks and High Performance Computing, London, UK, SpringerVerlag,
- (1999) 61–87
- Bischof, C.H., Quintana-Ortí, G.: Computing rank-revealing QR factorizations
- of dense matrices. ACM Transactions on Mathematical Software, 24(2) (1998)
- –253.
- Blackford, L. S., Choi, J., Cleary, A., Petitet, A., Whaley, R. C., Demmel, J., Dhillon,
- I., Stanley, K., Dongarra,J., Hammarling, S., Henry, G., Walker, D.: ScaLAPACK: a
- portable linear algebra library for distributed memory computers - design issues and
- performance. In Proceedings of the 1996 ACM/IEEE conference on Supercomputing
- (CDROM), Supercomputing -96, Washington, DCUSA, IEEE Computer Society
- (1996)
- Blanquer, I., Guerrero,D., Hernandez,V., Quintana-Ortí, E. S., Ruiz, P. A.: ParallelSLICOT
- implementation and documentation standards. Tech. rep., SLICOT Working
- Note (1998)
- Bolz, J., Farmer, I., Grinspun, E., Schröoder, P.: Sparse matrix solvers on the GPU:
- conjugate gradients and multigrid. ACM Trans. Graph., 22 (2003) 917–924
- Buatois, L., Caumon, G., Levy, B.: Concurrent number cruncher: An eﬃcient sparse
- linear solver on the GPU. In High Performance Computation Conference (HPCC),
- Springer Lecture Notes in Computer Sciences, (2007). Award: Second best student
- paper.
- Chien, L. S.: Hand Tuned SGEMM on GT200 GPU. Tech. rep., Department of
- Mathematics, Tsing Hua University, Taiwan, Feb. (2010)
- Choi, J., Dongarra, J., Walker,D.: PB-BLAS: A set of parallel block basic linear
- algebra subprograms. In Proc. of the 1994 Scalable High Performance Computing
- Conference, IEEE Computer Society Press, (1994)
- Christen, M., Schenk, O., Burkhart, H.: General-purpose sparse matrix building
- blocks using the NVIDIA CUDA technology platform. Tech. rep., (2007)
- Cong, J., Shinnerl, J. R., Xie,M., Kong,T., Yuan, X.: Large-scale circuit placement.
- ACM Trans. Des. Autom. Electron. Syst., 10 (2005) 389–430.
- Demmel, J., Dongarra, J., Croz, J. D., Greenbaum, A., Hammarling,S., Sorensen,D.:
- Prospectus for the development of a linear algebra library for high-performance
- computers. Tech. Rep. ANL/MCS-TM-97, 9700 South Cass Avenue, Argonne, IL
- -4801, USA, (1987)
- Eppler, K., Tröltzsch, F.: Discrete and continuous optimal control strategies in the
- selective cooling of steel proﬁles., Z. Angew. Math. Mech., 81 (2001) 247–248
- Ezzatti, P., Quintana-Ortí, E. S., Remón, A.: Eﬀcient model order reduction of
- large-scale systems on multi-core platforms. In ICCSA (5), B. Murgante, O. Gervasi,
- A. Iglesias, D. Taniar, and B. O. Apduhan, eds., vol. 6786 of Lecture Notes in
- Computer Science, Springer, (2011) 643–653
- Ezzatti, P., Quintana-Ortí, E. S., Remón, A.: High performance matrix inversion
- on a multi-core platform with several GPUs. IEEE Computer Society, (2011) 87–93
- Ezzatti, P., Quintana-Ortí, E. S., Remón, A.: Using graphics processors to accelerate
- the computation of the matrix inverse. The Journal of Supercomputing, online
- (2011).
- Fatica, M.: Accelerating LINPACK with CUDA on heterogenous clusters. In GPGPU,
- (2009) 46–51
- Gaikwad, A., Toke, I. M.: Gpu based sparse grid technique for solving multidimensional
- options pricing pdes. In Proceedings of the 2nd Workshop on High
- Performance Computational Finance, WHPCF -09, New York, NY, USA, ACM,
- (2009) 6:1–6:9
- Galiano V., Martín A., Migallón, H. Migallón, V. Penadés, J., Quintana-Ortí, E.S.:
- PyPLiC: A high-level interface to the parallel model reduction library PLiCMR.
- In Proceedings of the Eleventh International Conference on Civil, Structural and
- Environmental Engineering Computing, B. H. V. Topping, ed., Stirlingshire, United
- Kingdom, (2007), Civil-Comp Press. paper 62.
- Galoppo, N., Govindaraju,N. K., Henson, M., Manocha,D.: LU-GPU: Eﬀcient algorithms
- for solving dense linear systems on graphics hardware. In SC 05: Proceedings
- of the 2005 ACM/IEEE conference on Supercomputing, Washing- ton, DC, USA,
- IEEE Computer Society, (2005) 3
- Göddeke, D., Strzodka, R.A.: Cyclic reduction tridiagonal solvers on GPUs applied
- to mixed precision multigrid. IEEE Transactions on Parallel and Distributed
- Systems, doi: 10.1109/TPDS.2010.61, 22 (2011) 22–32
- Goodnight, N., Woolley, C., Lewin, G., Luebke, D., Humphreys, G.: A multigrid
- solver for boundary value problems using programmable graphics hardware. In HWWS ’03: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference
- on Graphics hardware, Aire-la-Ville, Switzerland, Switzerland, Eurographics
- Association, (2003) 102–111
- Gugercin, S., Sorensen, D., Antoulas, A.: A modiﬁed low-rank Smith method for
- large-scale Lyapunov equations. Numer. Algorithms, 32(1) (2003) 27–55
- Hall, J., Carr, N., Hart, J.: Cache and bandwidth aware matrix multiplication on
- the GPU. Tech. rep., UIUCDCS-R-20032328, University of Illinois, (2003)
- Higham,N.: Functions of Matrices: Theory and Computation. SIAM, Philadelphia,
- USA, (2008)
- Hillesland, K. E., Molinov, S. Grzeszczuk, R.: Nonlinear optimization framework for
- image-based modeling on programmable graphics hardware. In ACM SIGGRAPH
- Courses, SIGGRAPH ’05, New York, NY, USA, ACM, (2005)
- Ino, F., Matsui, M., Goda, K., Hagihara, K.: Performance study of LU decomposition
- on the programmable GPU. In HiPC, (2005) 83–94
- Iordache, M., Dumitriu, L.: Eﬃcient decomposition techniques for symbolic analysis
- of large-scale analog circuits by state variable method. Analog Integr. Circuits
- Signal Process., 40 (2004) 235–253
- Jung, J. H., O’leary. D.: Exploiting structure of symmetric or triangular matrices on
- a GPU. In First Workshop on General Purpose Processing on Graphics Processing
- Units, Northeastern Univ., Boston, (2007)
- Jung, J. H., O’leary. D.: Implementing an interior point method for linear programs
- on a CPU-GPU system. Electronic Transactions on Numerical Analysis
- Kamon, M., Tsuk, M., White, J.: Fasthenry: A multipole-accelerated 3-d inductance
- extraction program. IEEE Transactions on Microwave Theory and Techniques, 42
- (1994) 1750–1758
- Kamon, M., Wang, F., White, J.: Generating nearly optimal compact models from
- krylov-subspace based reduced order models. IEEE Transactions On Circuits and
- Systems-II: Analog and Digital Signal Processing, 47 (2000) 239–248
- Kolmogorov, A., Fomin, S. V.: Elements of the Theory of Functions and Functional
- Analysis. Dover Publications, (1999)
- Krüger, J., Schiwietz, T., Kipfer, P., Westermann, R.: Numerical simulations on
- PC graphics hardware. In ParSim 2004 (Special Session of EuroPVM/MPI 2004,
- Budapest, Hungary, (2004) 442–450
- Krüger, J., Westermann, R.: Linear algebra operators for GPU implementation of
- numerical algorithms. ACM Transactions on Graphics, 22 (2003) 908–916
- Larsen, E. S., McAllister, D.: Fast matrix multiplies using graphics hardware. In
- Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM),
- Supercomputing ’01, New York, NY, USA, (2001), ACM, 55–55
- Lasiecka, I., Triggiani, R.: Control Theory for Partial Diﬀerential Equations: Continuous
- and Approximation Theories I: Abstract Parabolic Systems. Cambridge
- University Press, Cambridge, UK, (2000)
- Li, J.-R., Kamon, M.: PEEC model of a spiral inductor generated by Fasthenry, in
- Dimension Reduction of Large-Scale Systems. P. Benner, V. Mehrmann, and D.
- Sorensen, eds., vol. 45 of Lecture Notes in Computational Science and Engineering,
- Springer-Verlag, Berlin/Heidelberg, Germany, (2005) 373–377
- Li, J.-R., White,J.: Reduction of large circuit models via low rank approximate
- gramians. International Journal of Applied Mathematics and Computer Science, 11
- (2001) 101–121
- Ltaief, H., Tomov, S., Nath, R., Du, P., Dongarra, J.: A scalable high performant
- Cholesky factorization for multicore with GPU accelerators. In VECPAR, vol. 6449
- of Lecture Notes in Computer Science, Springer, (2010) 93–101
- Lucas, R. F., Wagenbreth, G., Davis, D. M., Grimes, R.: Multifrontal computations
- on GPUs and their multi-core hosts. In Proceedings of the 9th international
- conference on High performance computing for computational science, VECPAR’10,
- Berlin, Heidelberg, Springer-Verlag, (2011) 71–82
- Maciol, P., Banas K.: Testing tesla architecture for scientiﬁc computing: the performance
- of matrix-vector product. vol. 3, (2008)
- Mena, H.: Numerical Solution of Diﬀerential Riccati Equations Arising in Optimal
- Control Problems for Parabolic Partial Diﬀerential Equations. PhD thesis, Escuela
- Politécnica Nacional, Quito, Ecuador, (2007)
- Moravanszky., A., Ag., N.: Dense matrix algebra on the GPU. In Direct3D ShaderX2,
- Engel W. F., (Ed.). Wordware Publishing, NovodeX AG, (2003) 2
- Nath, R., Tomov, S., Dongarra, J.: BLAS for GPUs. In Scientiﬁc Computing with
- Multicore and Accelerators, J. Kurzak, D. A. Bader, and J. a. Dongarra, eds., CRC
- Press, Dec. (2010)
- Penzl, T.: Lyapack Users Guide. Tech. Rep. SFB393/00-33, Sonderforschungsbereich
- Numerische Simulation auf massiv parallelen Rechnern, TU Chemnitz, 09107
- Chemnitz, Germany, (2000). Available from http://www.tu-chemnitz.de/sfb393/
- sfb00pr.html.
- Penzl, T.: Algorithms for model reduction of large dynamical systems. Linear
- Algebra Applications, 415 (2006) 322–343. (Reprint of Technical Report SFB393/9940,
- TU Chemnitz, (1999)
- Nath, S. T. R., Dongarra, J.: An Improved MAGMA GEMM for Fermi Graphics
- Processing Units. International Journal in High Performance Computing and
- Architectures, 24 (2010) 511–515
- Remón, A., Quintana-Ortí, E., Quintana-Ortí, G.: Parallel solution of band linear
- systems in model reduction. In Parallel Processing and Applied Mathematics, R.
- Wyrzykowski, J. Dongarra, K. Karczewski, and J. Wasniewski, eds., vol. 4967 of
- Lecture Notes in Computer Science, Springer Berlin / Heidelberg, (2008) 678–687
- Riaza R.: Diﬀerential-Algebraic Systems. Analytical Aspects and Circuit Applications,
- World Scientiﬁc, (2008)
- Ries, F., De Marco, T., Zivieri, M., Guerrieri, R.: Triangular matrix inversion on graphics processing unit. In Proceedings of the Conference...
- Saad,Y.: Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2nd ed., (2003)
- Schenk, O. Gärtner, K.: Sparse factorization with two level scheduling in pardiso. In PPSC, (2001)
- Sengupta, S., Harris, M., Zhang, Y., Owens, J. D. Scan primitives for GPU computing. In GH ’07: Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS...
- Tomov, S., Nath, R., Ltaief, H., Dongarra, J.: Dense linear algebra solvers for multicore with GPU accelerators. In IPDPS Workshops, IEEE,...
- Varga, A.: Task II.B.1 – selection of software for controller reduction. SLICOT Working Note 1999–18, The Working Group on Software (WGS),...
- Varga, A.: Model reduction software in the SLICOT library. In Applied and Computational Control, Signals, and Circuits, volume 629 of The...
- Volkov, V, Demmel, J.: Benchmarking GPUs to tune dense linear algebra. In SC ’08:
- Proceedings of the 2008 ACM/IEEE conference on Supercomputing, Piscataway,
- NJ, USA, (2008), IEEE Press, (2008) 1–11
- Wachspress, E.L.: Iterative solution of the Lyapunov matrix equation. Appl. Math.
- Letters, 107 (1988) 87–90
- Zhang, Y., Cohen, J., Owens, J. D.: Fast tridiagonal solvers on the GPU. In PPOPP,
- (2010) 127–136