Abstract
The current trend of computer architecture evolving towards exaflop/s is the fast increasing floating point performance (the so-called “free” flops) accompanied by much slowly improving the bandwidth of memory and network. Numerical simulation would undergo the challenge posed by the unbalanced increase in the compute power and the capability of data movement. In this paper, after reviewing the challenges of hardware and software in moving towards exascale computing, we present co-design thinking for selecting, optimizing, and developing a numerical algorithm and a simulation tool to meet the challenge of simulation at extreme scale. Examples are presented to demonstrate the new way of thinking and its effectiveness on the emerging architecture.
Similar content being viewed by others
References
DOE Exascale Initiative Roadmap (2009) Architecture and technology workshop. San Diego
DOE Office of Science Summary report of the Advanced Scientific Computing Advisory Committee (ASCAC) Subcommittee. The opportunities and challenges of exascale computing, Fall 2010
Shalf J, Dosanjh S, Morrison J (2011) Exascale computing technology challenges. VECPAR 2010. LNCS 6449:1–25
Tian R, Sun N (2013) Some considerations about exascale computing in China. Commun China Comput Fed 9(2):52–60 (in Chinese)
Tian R (2013) Co-design thinking towards exascale computing. Inf Technol Lett 10(3):50–63 (in Chinese)
Thibodeau P (2012) Exascale unlikely before 2020 due to budget woes. Computerworld. Nov 19, 2012
Harrod W (2012) DOE exascale computing Initiative (ECI) update. DOE, Office of Science (SC), Oct 4, 2012
Dongarra J (2013) Emerging heterogeneous technologies for high performance computing. 22nd International Heterogeneity in Computing Workshop. IPDP, Boston
DOE E3 Report, http://www.er.doe.gov/ascr/ProgramDocuments/ProgDocs.html
A platform, strategy for the advanced simulation and computing, Program (NA-ASC-113R-07-Vol. 1-Rev. 0)
Chen J, Bell J (2011) Combustion exascale co-design center. Sixth international exascale software project workshop, San Francisco, April 6–7
Dennis JM, Edwards J, Guba O, St-Cyr A, Taylor MA, Worley PH, (2012) CAM-SE: a scalable spectral element dynamical core for the community atmosphere model. Int J High Perform Comput Appl 26(1):74–89
Eisenbach M, Zhou CG, Nicholson DM, Brown G, Larkin J, and Schulthess TC (2010) Thermodynamics of magnetic systems from first principles: WL-LSMS. In the proceeding of the 52nd Cray User Group meeting, CUG 2010
http://nvworld.ru/files/articles/calculations-on-gpu-advantages-fermi/fermipeformance.pdf
Tian R (2013) Meshfree/GFEM in hardware-efficiency prospective. Interaction and multiscale mechanics. DOI:10.12989/imm.2013.6.2.000
Tian R (2013) Extra-dof-free and linearly independent enrichments in GFEM. Comput Method Appl Mech Eng 266:1–22
Babuška I, Melenk JM (1997) Partition of unity method. Int J Numer Method Eng 40:727–758
Melenk JM, Babuška I (1996) The partition of unity finite element method: basic theory and applications. Comput Method Appl Mech Eng 139:289–314
Babuška I, Caloz G, Osborn JE (1994) Special finite element methods for a class of second order elliptic problems with rough coefficients. SIAM J Numer Anal 31:945–981
Duarte CA, Oden JT (1996) An h-p adaptive method using clouds. Comput Methods Appl Mech Eng 139(1–4):237–262
Oden JT, Duarte CA, Zienkiewicz OC (1998) A new cloud-based hp finite element method. Comput Method Appl Mech Eng 153(1–2):117–126
Strouboulis T, Babuška I, Copps K (2000) The design and analysis of the generalized finite element method. Comput Method Appl Mech Eng 181(1–3):43–69
Strouboulis T, Copps K, Babuška I (2000) The generalized finite element method: an example of its implementation and illustration of its performance. Int J Numer Method Eng 47:1401–1417
Strouboulis T, Copps K, Babuška I (2001) The generalized finite element method. Comput Method Appl Mech Eng 190(32–33):4081–4193
Strouboulis T, Zhang L, Babuška I (2003) Generalized finite element method using mesh-based handbooks: application to problems in domains with many voids. Comput Method Appl Mech Eng 192:3109–3161
Strouboulis T, Zhang L, Babuška I (2004) \(p\)-version of the generalized FEM using mesh-based handbooks with applications to multiscale problems. Int J Numer Method Eng 60:1639–1672
Strouboulis T, Zhang L, Wang D, Babuška I (2006) A posteriori error estimation for generalized finite element methods. Comput Method Appl Mech Eng 195:852–879
Strouboulis T, Babuška I, Hidajat R (2006) The generalized finite element method for Helmholtz equation: theory, computation, and open problems. Comput Method Appl Mech Eng 195:4711–4731
Strouboulis T, Hidajat R, Babuška I (2008) The generalized finite element method for Helmholtz equation, part II: effect of choice of handbook functions, error due to absorbing boundary conditions and its assessment. Comput Method Appl Mech Eng 197:364–380
Duarte CA, Babuška I, Oden JT (2000) Generalized finite element methods for three-dimensional structural mechanics problems. Comput Struct 77:215–232
Duarte CA, Hamzeh ON, Liszka TJ, Tworzydlo WW (2001) A generalized finite element method for the simulation of three-dimensional dynamic crack propagation. Comput Method Appl Mech Eng 190:2227–2262
Simone A, Duarte CA, Van der Giessen E (2006) A generalized finite element method for polycrystals with discontinuous grain boundaries. Int J Numer Method Eng 67:1122–1145
Duarte CA, Kim DJ (2008) Analysis and applications of a generalized finite element method with global-local enrichment functions. Comput Method Appl Mech Eng 197(6–8):487–504
O’Hara P, Duarte CA, Eason T (2009) Generalized finite element analysis for three dimensional problems exhibiting sharp thermal gradients. Comput Method Appl Mech Eng 198:1857–1871
Lancaster P, Salkauskas K (1981) Surfaces generated by moving least squares methods. Math Comput 37:141–158
Belytschko T, Lu YY, Gu L (1994) Element-free Gakerkin method. Int J Numer Method Eng 37:229–256
Li S, Liu WK (2001) Meshfree and particle methods and their applications. Appl Mech Rev 55:1–34
Cecka C, Lew A, Darve E (2011) Assembly of finite element methods on graphics processors. Int J Numer Method Eng 85(5):640–669
Karatarakis A, Metsis P, Papadrakakis M (2013) GPU-acceleration of stiffness matrix calculation and efficient initialization of EFG meshless methods. Comput Method Appl Mech Eng (in press), Accepted Manuscript, Available online 4 March 2013
Buttari A, Dongarra J, Kurzak J, Luszczek P, Tomov S (2008) Using mixed precision for sparse matrix computations to enhance the performance while achieving 64-bit accuracy. ACM Trans Math Softw (TOMS) 34(4):1–22
Göddeke D, Strzodka R, Turek S (2005) Accelerating double precision FEM simulations with GPUs. In Proceedings of ASIM 2005–18th symposium on simulation technique
Strzodka R, Göddeke D (2006) Pipelined mixed precision algorithms on FPGAs for fast and accurate PDE solvers from low precision components. In IEEE symposium on field-programmable custom computing machines (FCCM 2006), pp 259–268
Strzodka R, Göddeke D (2006) Mixed precision methods for convergent iterative schemes. In Proceedings of the 2006 workshop on edge computing using new commodity architectures, pp D-59-60, May 2006
Göddeke D, Strzodka R, Turek S (2007) Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations. Int J Parallel, Emergent Distrib Syst (IJPEDS), Special issue: Appl. Parallel Comput 22(4):221–256
Göddeke D, Strzodka R (2008) Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations (part 2: Double precision GPUs). Technical University Dortmund, Technical report
Jakub Kurzak, Jack Dongarra (2007) Implementation of mixed precision in solving systems of linear equations on the Cell processor. Concurr Comput Pract Experience 19(10):1371–1385
Wilkinson JH (1963) Rounding errors in algebraic processes. Prentice-Hall, Englewood Cliffs
Moler CB (1967) Iterative refinement in floating point. J ACM 14(2):316–321
Jankowski M, Woniakowski H (1977) Iterative refinement implies numerical stability. J BIT Numer Math 17(3):303–311
Higham NJ (2002) Accuracy and stability of numerical algorithms. Society for Industrial and Applied Mathematics, Philadelphia
Buttari A, Dongarra J, Langou J, Langou J, Luszczek P, Kurzak J (2007) Mixed precision iterative refinement techniques for the solution of dense linear systems. Int J High Perform Comput Appl 21:457–466
Langou J, Langou J, Luszczek P, Kurzak J, Buttari A, Dongarra J (2006) Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems). Proceedings of the 2006 ACM/IEEE conference on supercomputing
Demmel JW (1997) Applied numerical linear algebra. SIAM Press, Philadelphia
Demmel J, Hida Y, Kahan W, Li XS, Mukherjee S, Riedy EJ (2005) Error bounds from extra precise iterative refinement. Technical Report No. UCB/CSD-04-1344, LAPACK Working Note 165, Feb 2005
Taiji M, Narumi T, Ohno Y, Futatsugi N, Suenaga A, Takada N, Konagaya A (2003) Protein explorer: a petaflops special-purpose computer system for molecular dynamics simulations. Proceedings of Supercomputing 2003 in CD-ROM
Anderson E, Bai Z, Bischof C, Blackford LS, Demmel JW, Dongarra JJ, Du Croz J, Greenbaum A, Hammarling S, McKenney A, Sorensen D LAPACK Users’ Guide. SIAM, http://www.netlib.org/lapack/
Li XS, Demmel JW, Bailey DH, Henry G, Hida Y, Iskandar J, Kahan W, Kang SY, Kapur A, Martin MC, Thompson BJ, Tung T, Yoo DJ (2002) Design, implementation and testing of extended and mixed precision BLAS. ACM Trans Math Softw (TOMS) 28(2):152–205
Göddeke D, Strzodka R, Turek S (2007) Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations. Int J Parallel Emergent Distrib Syst (Special Issue: Applied Parallel Computing) 22(4):221–256
Göddeke D, Wobker H, Strzodka R, Mohd-Yusof J, McCormick P, Turek S (2009) Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU. Int J Comput Sci Eng 4(4):254–269
Strzodka R, Göddeke D (2006) Pipelined mixed precision algorithms on FPGAs for fast and accurate PDE solvers from low precision components. In FCCM’06: Proceedings of the 14th annual IEEE symposium on field-programmable custom computing machines (FCCM’06) pp 259–270
Strzodka R, Göddeke D (2006) Mixed precision methods for convergent iterative schemes. In Proceedings of the 2006 workshop on edge computing using new commodity architectures, D-59-60
Kurzak J, Dongarra JJ (2007) Implementation of mixed precision in solving systems of linear equations on the CELL processor. Concurr Comput Pract Experience 19(10):1371–1385
Jianhua Liu, Chaowei Wang, Jiangyong Ren, Rong Tian (2012) A mixed precision explicit finite element algorithm on heterogeneous architecture and its CUDA implementation. Comput Sci 39(6):293–296 (in Chinese)
Liu J (2011) A mixed precision GPU acceleration algorithm and its application to FEM. MS thesis of Graduate School of Chinese Academy of Sciences (in Chinese)
Aifantis EC (1992) On the role of gradients in the localization of deformation and fracture. Int J Eng Sci 30(10):1279–1299
Hill R (1963) Elastic properties of reinforced solids: some theoretical principles. J Mech Phys Solids 11(5):357–372
Hill R (1972) On constitutive macro-variables for heterogeneous solids at finite strain. Proc R Soc Lond Ser A Math Phys Sci 326(1565):131–147
Tian R, Yagawa G (2005) Generalized node and high-performance elements. Int J Numer Method Eng 64:2039–2071
Tian R, Yagawa G, Terasaka H (2006) Linear dependence problems of partition of unity based generalized FEMs. Comput Method Appl Mech Eng 195:4768–4782
Tian R (2006) A PU-based 4-node quadratic tetrahedron and linear dependence elimination in three dimensions. Int J Comput Method 3:545–562
Tian R, Matsubara H, Yagawa G (2006) Advanced 4-node tetrahedrons. Int J Numer Methods Eng 68:1209–1231
Tian R, Yagawa G (2006) Allman’s triangle, rotational dof and partition of unity. Int J Numer Method Eng 69:837–858
McVeigh C, Liu WK (2010) Multiresolution continuum modeling of micro-void assisted dynamic adiabatic shear band propagation. J Mech Phys Solid 58(2):187–205
McVeigh C, Vernerey F, Liu WK, Brinson C (2006) Multiresolution analysis for material design. Comput Method Appl Mech Eng 195:5053–5076
McVeigh C, Vernerey FJ, Liu WK, Moran B, Olson GB (2007) An Interactive microvoid shear localization mechanism in high strength steels. J Mech Phys Solids 55(2):224–225
McVeigh C (2007) Ph.D. Thesis, Northwestern University
McVeigh C, Liu WK (2008) Linking microstructure and properties through a predictive multiresolution continuum. Comput Method Appl Mech Eng 197:3268–3290
McVeigh C, Liu WK (2009) Multiresolution modeling of ductile reinforced brittle composites. J Mech Phys Solids 57:244– 267
Tian R, Moran B, Liu WK, Olson GB (2008) Multiscale fracture simulator. Dynamic microstructure design consortium (ONR Contract: N00014–05-C-0241) base final Report
Tian R, Liu WK, Chan S, Olson GB, Tang S, Wang JS, Jou HJ, Gong JD, Moran B (2009) Linking microstructures to fracture toughness-predictive 3D process zone simulations. The D 3-D annual PI Review, Evanston, March 23–25
Rong Tian, Stephanie Chan, Shan Tang, Kopacz Adrian M, Jian-Sheng Wang, Herng-Jeng Jou, Larbi Siad, Lars-Erik Lindgren, Gregory Olson, Kam Liu Wing (2010) A multi-resolution continuum simulation of the ductile fracture process. J Mech Phys Solids 58(10):1681–1700
Dongarra J et al The international exascale software project roadmap. www.iesp.org
Schroeder B, Gibson GA (2006) A large-scale study of failures in high-performance computing systems. Proceedings of the international conference on dependable systems and networks pp 249–258
Liu Y (2007) Reliability-aware optimal checkpoint/restart model in high performance computing, PhD Thesis. Louisiana
Cappello F, Geist A, Gropp B et al (2009) Toward exascale resilience. Int J High Perform C 23:374–388
Geist A (2009) Co-design challenges going from petascale to exascale. Workshop on bio-molecular simulations on future computing architectures, Oak Ridge
Li L, Wang C, Ma Z, Tian R (2013) petaPar: a highly scalable and fault tolerant meshfree/particle simulation code based on free assembly mesh. HPC China 2013, Guilin, China, October 29–31, 2013
Gingold RA, Monaghan JJ (1977) Smoothed particle hydrodynamics: theory and application to non-spherical stars. Mon Not R Astron Soc 181:375–389
Libersky LD, Petschek AG (1990) Smooth particle hydrodynamics with strength of materials. Adv Free Lagrange Method Lect Notes Phys 395:248–257
Liu MB, Liu GR (2010) Smoothed particle hydrodynamics (SPH): an overview and recent developments. Arch Comput Method Eng 17:25–76
Warren MS, Salmon JK (1995) A portable parallel particle program. Comput Phys Commun 87(1):266–290
Goozee RJ, Jacobs PA (2003) Distributed and shared memory parallelism with a smoothed particle hydrodynamics code. Anziam J 44:202–228
Maruzewski P, TouzéD L, Oger G et al (2010) SPH high-performance computing simulations of rigid solids impacting the free-surface of water. J Hydraul Res 48(S1):126–134
Springel V (2005) The cosmological simulation code gadget-2. Mon Not R Astron Soc 364(4):1105–1134
Holmes DW, Williams JR, Tilke P (2011) A framework for parallel computational physics algorithms on multi-core: SPH in parallel. Adv Eng Softw 42(11):999–1008
Ihmsen M, Akinci N, Becker M et al (2011) A Parallel SPH Implementation on Multi-Core CPUs. Comput Graph Forum 30(1): 99–112
Harada T, Koshizuka S, Kawaguchi Y (2007), Smoothed particle hydrodynamics on GPUs. Proc Comput Graph Int pp 63–70
Hérault A, Bilotta G, Dalrymple RA (2010) SPH on GPU with CUDA. J Hydraul Res 48(S1):74–79
Valdez-Balderas D, Domínguez J M, Rogers BD, et al. (2012) Towards accelerating smoothed particle hydrodynamics simulations for free surface flows on multi-GPU clusters. J Parallel Distr Com
Domínguez JM, Crespo AJC, Valdez-Balderas D et al (2013) New multi-GPU implementation for smoothed particle hydrodynamics on heterogeneous clusters. Comput Phys Commun 184:1848–1860
Domínguez JM, Crespo AJC, Gómez-Gesteira M (2013) Optimization strategies for CPU and GPU implementations of a smoothed particle hydrodynamics method. Comput Phys Commun 184:617–627
Sulsky D, Chen Z, Schreyer HL (1994) A particle method for history-dependent materials. Comput Method Appl Mech Eng 118:179–196
Love E, Sulsky DL (2006) An unconditionally stable, energy-momentum consistent implementation of the material-point method. Comput Method Appl Mech Eng 195(33–36):3903–3925
Wallstedt PC, Guilkey JE (2008) An evaluation of explicit time integration schemes for use with the generalized interpolation material point method. J Comput Phys 227(22):9628–9642
Zhang Duan Z, Xia Ma, Giguere Paul T (2011) Material point method enhanced by modified gradient of shape function. J Comput Phys 230(16):6379–6398
Więckowski Z (2004) The material point method in large strain engineering problems. Comput Method Appl Mech Eng 193(39–41):4417–4438
Sulsky D, Kaul A (2004) Implicit dynamics in the material-point method. Comput Method Appl Mech Eng 193(12–14):1137–1170
Wang HK, Liu Y, Zhang X (2012) The carbon nanotube composite simulation by material point method. Comput Mater Sci 57:23–29
Zhang X, Sze KY, Ma S (2006) An explicit material point finite element method for hyper velocity impact. Int J Numer Method Eng 66:689–706
Lian YP, Zhang X, Liu Y (2012) An adaptive finite element material point method and its application in extreme deformation problems. Comput Method Appl Mech Eng 241–244:275–285
Lian YP, Zhang X, Liu Y (2011) Coupling of finite element method with material point method by local multi-mesh contact method. Comput Method Appl Mech Eng 200(47–48):3482–3494
Wiȩckowski Z (2004) The material point method in large strain engineering problems. Comput Method Appl Mech Eng 193(39–41):4417–4438
Sulsky D, Kaul A (2011) Implicit dynamics in the material-point method. Comput Method Appl Mech Eng 193(12–14):1137–1170
Joubert W (2012) Porting the denovo radiation transport code to Titan: lessons learned. OLCF Titan Workshop 2012
Franck Cappello (2009) Fault tolerance in petascale/ exascale systems: current knowledge, challenges and research opportunities. Int J High Perform Comput Appl 23:212–226
Keyes D (2012) Large-scale simulation in science and engineering: digesting the fruit, replanting the fields. Co-Design 2012, Beijing, China, October 23–25, 2012
Ren J, Wang CW, Wang Y, Tian R (2013) Scalability tests of a finite element code on hundreds of thousands cores and heterogeneous architecture. Comm Comp Info Sci 207:151–165
Acknowledgments
Zedong Wu helped to calculate the data of Fig. 7, Chaowei Wang helped with the implementation of the mixed precision algorithm of msFEM, Jianhua Liu and Jiangyong Ren helped with Figs. 8, 9, and 10. Leisheng Li helped to parallelize petaPar and to obtain the data of Fig. 11. Fuxi Zhang and Zhigang Huo coded dcrd. The research described in this paper was financially supported by the “100 Talent Program” of Chinese Academy of Sciences and the National Foundation of Sciences of China (Grand numbers: 11072241, 11111140020, 91130026, 60633040). This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This research used resources of the National Supercomputing Centre in Shenzhen, China.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tian, R. Simulation at Extreme-Scale: Co-Design Thinking and Practices. Arch Computat Methods Eng 21, 39–58 (2014). https://doi.org/10.1007/s11831-014-9095-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11831-014-9095-y