Implementation and Scalability Analysis of Balancing Domain Decomposition Methods

Badia, Santiago; Martín, Alberto F.; Principe, Javier

doi:10.1007/s11831-013-9086-4

Implementation and Scalability Analysis of Balancing Domain Decomposition Methods

Published: 02 July 2013

Volume 20, pages 239–262, (2013)
Cite this article

Archives of Computational Methods in Engineering Aims and scope Submit manuscript

Santiago Badia^1,2,
Alberto F. Martín^1,2 &
Javier Principe^1,2

625 Accesses
21 Citations
Explore all metrics

Abstract

In this paper we present a detailed description of a high-performance distributed-memory implementation of balancing domain decomposition preconditioning techniques. This coverage provides a pool of implementation hints and considerations that can be very useful for scientists that are willing to tackle large-scale distributed-memory machines using these methods. On the other hand, the paper includes a comprehensive performance and scalability study of the resulting codes when they are applied for the solution of the Poisson problem on a large-scale multicore-based distributed-memory machine with up to 4096 cores. Well-known theoretical results guarantee the optimality (algorithmic scalability) of these preconditioning techniques for weak scaling scenarios, as they are able to keep the condition number of the preconditioned operator bounded by a constant with fixed load per core and increasing number of cores. The experimental study presented in the paper complements this mathematical analysis and answers how far can these methods go in the number of cores and the scale of the problem to still be within reasonable ranges of efficiency on current distributed-memory machines. Besides, for those scenarios where poor scalability is expected, the study precisely identifies, quantifies and justifies which are the main sources of inefficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Containerization technologies: taxonomies, applications and challenges

Article 08 June 2021

Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS

Cloud benchmarking and performance analysis of an HPC application in Amazon EC2

Article Open access 28 June 2023

Notes

The spaces \(\widehat{V}\) and V _i can also be understood in a functional setting as the global and local spaces of discrete harmonic functions (see [6]).
In a functional setting, functions in \(\widehat{V}\) are uni-valued on Γ. On the contrary, since every node p in Γ _h is replicated n(p) times in \(\varPi_{i=1}^{n_{\mathrm{sbd}}}\varGamma_{i}\), functions in V can take different values at different subdomains. As in [42], \(\widehat{\left . \cdot \right . }\) is used to denote uni-valued functions on Γ.
It is based on the observation that SI ₀ v ₀ for v ₀∈H ₀ can readily be obtained as a linear combination of Sϕ ⁱ quantities, that have already been computed when assembling S ₀ at the preconditioner set-up. This observation, combined with a slight modification of the PCG recurrence described in [2], spares one Schur complement-vector product per PCG iteration.
This definition of {G _a:a=1,…,n _cts} generates a unique partition of Γ _h.
BDD methods can certainly be reformulated as preconditioners for the global linear system (1). This enables approximate solvers (e.g., AMG [35, 40, 43]) to be used in conjunction with BDD methods. Although this approach relaxes the arithmetic/memory demands of sparse direct solvers (particularly in 3D), it would result in a new source of difficulties (e.g., robustness/complexity trade-off evaluation, parameter tuning, load unbalancing issues, poor flop rates) that deserve further research.
For the BNN method, we assume that the initial value x ₀ is such that r ₀ is balanced (see [28]).
An explicit assembly of S _i is required for some DD preconditioners (see e.g. [17]). Besides, this approach allows one to exploit the dense level 2 BLAS for the Schur complement-vector product, which can only compensate the expensive set-up of S _i for non-scalable preconditioners with high iteration counts.
For example, PARDISO is based on the sparse Cholesky factorization without pivoting for symmetric positive definite problems, while for symmetric indefinite problems, it uses a more expensive sparse LDL ^T factorization which, for numerical stability purposes, combines static (prior-to-factorization) pivoting via symmetric weighted matchings and classical Bunch-Kaufman dynamic (during factorization) pivoting only applied inside the supernodes [36, 37].

References

Badia S, Codina R (2008) Algebraic pressure segregation methods for the incompressible Navier-Stokes equations. Arch Comput Methods Eng 15:343–369
Article MathSciNet MATH Google Scholar
Badia S, Martín AF, Príncipe J (2012) Enhanced balancing Neumann-Neumann preconditioning in computational fluid and solid mechanics. Int J Numer Methods Eng (in press)
Baker AH, Gamblin T, Schulz M, Yang UM (2011) Challenges of scaling algebraic multigrid across modern multicore architectures In: Parallel distributed processing symposium (IPDPS). 2011 IEEE international, pp 275–286
Google Scholar
Balay S, Brown J, Buschelman K, Eijkhout V, Gropp WD, Kaushik D, Knepley MG, McInnes LC, Smith BF, Zhang H (2012) PETSc users manual. Technical report ANL-95/11, Argonne National Laboratory
Balay S, Brown J, Buschelman K, Gropp WD, Kaushik D, Kne- pley MG, McInnes LC, Smith BF, Zhang H (2012). PETSc web page. http://www.mcs.anl.gov/petsc
Brenner SC, Scott R (2010) The mathematical theory of finite element methods, 3rd edn. Springer, Berlin
Google Scholar
Conceição D, Goldfeld P, Sarkis M (2007) Robust two-level lower-order preconditioners for a higher-order stokes discretization with highly discontinuous viscosities. In: High performance computing for computational science—VECPAR 2006, pp 319–333
Chapter Google Scholar
Davis TA (2006) Direct methods for sparse linear systems, vol 2. SIAM, Philadelphia
Book MATH Google Scholar
De Roeck YH, Le Tallec P (1991) Analysis and test of a local domain decomposition preconditioner. In: Fourth international symposium on domain decomposition methods for partial differential equations, 112 pp
Google Scholar
Dohrmann CR (2003) A preconditioner for substructuring based on constrained energy minimization. SIAM J Sci Comput 25(1):246–258
Article MathSciNet MATH Google Scholar
Farhat C, Roux F-X (1991) A method of finite element tearing and interconnecting and its parallel solution algorithm. Int J Numer Methods Eng 32(6):1205–1227
Article MATH Google Scholar
Farhat C, Pierson K, Lesoinne M (2000) The second generation FETI methods and their application to the parallel solution of large-scale linear and geometrically non-linear structural analysis problems. Comput Methods Appl Mech Eng 184(2–4):333–374
Article MATH Google Scholar
Ferencz RM, Hughes TJR (1998) Iterative finite element solutions in nonlinear solid mechanics. In: Numerical methods for solids (part 3). Handbook of numerical analysis, vol VI
Google Scholar
Filippone S, Buttari A (2012) Object-oriented techniques for sparse matrix computations in Fortran 2003. ACM Trans Math Softw 38(4):23:1–23:20
Article Google Scholar
Filippone S, Colajanni M (2000) PSBLAS: a library for parallel linear algebra computation on sparse matrices. ACM Trans Math Softw 26(4):527–550
Article Google Scholar
George A (1973) Nested dissection of a regular finite element mesh. SIAM J Numer Anal 10(2):345–363
Article MathSciNet MATH Google Scholar
Giraud L, Haidar A, Watson LT (2008) Parallel scalability study of hybrid preconditioners in three dimensions. Parallel Comput 34(6–8):363–379
Article MathSciNet Google Scholar
Glowinski R, Wheeler MF (1988) Domain decomposition and mixed finite element methods for elliptic problems. In: First international symposium on domain decomposition methods for partial differential equations, pp 144–172
Google Scholar
Goldfeld P (2003) Balancing Neumann-Neumann preconditioners for the mixed formulation of almost-incompressible linear elasticity. PhD thesis
Goldfeld P, Pavarino LF, Widlund OB (2003) Balancing Neumann-Neumann preconditioners for mixed approximations of heterogeneous problems in linear elasticity. Numer Math 95:283–324
Article MathSciNet MATH Google Scholar
Gropp W, Lusk E, Skjellum A (1999) Using MPI: portable parallel programming with the message passing interface, vol 1. MIT Press, Cambridge
Google Scholar
Gropp W, Lusk E, Thakur R (1999) Using MPI-2: advanced features of the message passing interface. MIT Press, Cambridge
Google Scholar
Heroux MA, Willenbring JM (2003) Trilinos users guide. Technical report SAND2003-2952, Sandia National Laboratories
Heroux MA, Bartlett RA, Howle VE, Hoekstra RJ, Hu JJ, Kolda TG, Lehoucq RB, Long KR, Pawlowski RP, Phipps ET, Salinger AG, Thornquist HK, Tuminaro RS, Willenbring JM, Williams A, Stanley KS (2005) An overview of the Trilinos project. ACM Trans Math Softw 31(3):397–423
Article MathSciNet MATH Google Scholar
Hoefler T, Traff JL (2009) Sparse collective operations for MPI. In: Proceedings of the 2009 IEEE international symposium on parallel & distributed processing, pp 1–8
Chapter Google Scholar
Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392
Article MathSciNet Google Scholar
Lin PT, Shadid JN, Sala M, Tuminaro RS, Hennigan GL, Hoekstra RJ (2009) Performance of a parallel algebraic multilevel preconditioner for stabilized finite element semiconductor device modeling. J Comput Phys 228(17):6250–6267
Article MathSciNet MATH Google Scholar
Mandel J (1993) Balancing domain decomposition. Commun Numer Methods Eng 9(3):233–241
Article MathSciNet MATH Google Scholar
Mandel J, Dohrmann CR (2003) Convergence of a balancing domain decomposition by constraints and energy minimization. Numer Linear Algebra Appl 10(7):639–659
Article MathSciNet MATH Google Scholar
Mandel J, Sousedík B, Dohrmann C (2008) Multispace and multilevel BDDC. Computing 83:55–85
Article MathSciNet MATH Google Scholar
Saad Y (1995) Data structures and algorithms for domain decomposition and distributed sparse matrix computations. Technical report 95-014, Department of Computer Science, University of Minnesota, Minneapolis, MN
Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. SIAM, Philadelphia
Book MATH Google Scholar
Saad Y, Sosonkina M (1999) Distributed Schur complement techniques for general sparse linear systems. SIAM J Sci Comput 21(4):1337–1356
Article MathSciNet Google Scholar
Sahni O, Carothers CD, Shephard MS, Jansen KE (2009) Strong scaling analysis of a parallel, unstructured, implicit solver and the influence of the operating system interference. Sci Program 17(3):261–274
Google Scholar
Sala M, Tuminaro R (2008) A new Petrov-Galerkin smoothed aggregation preconditioner for nonsymmetric linear systems. SIAM J Sci Comput 31(1):143–166
Article MathSciNet MATH Google Scholar
Schenk O, Gärtner K (2004) Solving unsymmetric sparse systems of linear equations with PARDISO. Future Gener Comput Syst 20(3):475–487
Article Google Scholar
Schenk O, Gärtner K (2006) On fast factorization pivoting methods for sparse symmetric indefinite systems. Electron Trans Numer Anal 23:158–179
MathSciNet MATH Google Scholar
Šístek J, Sousedík B, Burda P, Mandel J, Novotný J (2011) Application of the parallel BDDC preconditioner to the Stokes flow. Comput Fluids 46(1):429–435
Article MathSciNet MATH Google Scholar
Strang G (2006) Linear algebra and its applications. Thomson Brooks/Cole, Belmont
Google Scholar
Stüben K (2001) A review of algebraic multigrid. J Comput Appl Math 128(1–2):281–309
Article MathSciNet MATH Google Scholar
Tang J, Nabben R, Vuik C, Erlangga Y (2009) Comparison of two-level preconditioners derived from deflation, domain decomposition and multigrid methods. J Sci Comput 39(3):340–370
Article MathSciNet MATH Google Scholar
Toselli A, Widlund O (2005) Domain decomposition methods—algorithms and theory. Springer, Berlin
MATH Google Scholar
Vaněk P, Mandel J, Brezina M (1996) Algebraic multigrid by smoothed aggregation for second and fourth order elliptic problems. Computing 56:179–196
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work has been funded by the European Research Council under the FP7 Programme Ideas through the Starting Grant No. 258443—COMFUS: Computational Methods for Fusion Technology and the FUSSIM project (Ref. ENE2011-285556), financed by the Spanish Ministry of Science and Innovation (MICINN), under the programme “Proyectos de Investigación fundamental no orientada”. A.F. Martín was also partially funded by the UPC postdoctoral grants under the programme “BKC-5-Atracció i Fidelització de talent al BKC”. The authors thankfully acknowledge the computer resources, technical expertise and assistance provided by the Red Española de Supercomputación and the Juelich Supercomputing Centre in the exploitation of the HPC for Fusion (HPC-FF) under the EFDA HPC Implementing Agreement (EFDA (08) 39/4.1).

Author information

Authors and Affiliations

Centre Internacional de Mètodes Numèrics a l’Enginyeria (CIMNE), Parc Mediterrani de la Tecnologia, UPC, Esteve Terradas 5, 08860, Castelldefels, Spain
Santiago Badia, Alberto F. Martín & Javier Principe
Universitat Politècnica de Catalunya, Jordi Girona 1-3, Edifici C1, 08034, Barcelona, Spain
Santiago Badia, Alberto F. Martín & Javier Principe

Authors

Santiago Badia
View author publications
You can also search for this author in PubMed Google Scholar
Alberto F. Martín
View author publications
You can also search for this author in PubMed Google Scholar
Javier Principe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Santiago Badia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Badia, S., Martín, A.F. & Principe, J. Implementation and Scalability Analysis of Balancing Domain Decomposition Methods. Arch Computat Methods Eng 20, 239–262 (2013). https://doi.org/10.1007/s11831-013-9086-4

Download citation

Received: 10 March 2013
Accepted: 10 March 2013
Published: 02 July 2013
Issue Date: September 2013
DOI: https://doi.org/10.1007/s11831-013-9086-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Implementation and Scalability Analysis of Balancing Domain Decomposition Methods

Abstract

Access this article

Similar content being viewed by others

Containerization technologies: taxonomies, applications and challenges

Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS

Cloud benchmarking and performance analysis of an HPC application in Amazon EC2

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Implementation and Scalability Analysis of Balancing Domain Decomposition Methods

Abstract

Access this article

Similar content being viewed by others

Containerization technologies: taxonomies, applications and challenges

Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS

Cloud benchmarking and performance analysis of an HPC application in Amazon EC2

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation