1st exaFOAM Workshop Resources
Welcome to the 1st exaFOAM Workshop resource page!
Here you will find a collection of videos and slides that cover the content presented in the workshop. Whether you attended the Workshop or missed it, this page will provide you with the opportunity to catch up on the material and deepen your understanding of exaFOAM project.
HPC Grand Challenges are designed to push available HPC systems to their limits and showcase the performance gains achieved over the duration of the project. The validation includes the definition of suitable acceptance criteria. The computational performance of the original and exascale-enhanced OpenFOAM code will be compared to evaluate the project objectives in a direct before/after comparison. Two HPC Grand Challenges have been defined: the high-fidelity numerical simulation of a combustor based on the experimental setup “Confined Jet High-pressure” (CJH) from the German Aerospace Center (DLR), and a high-fidelity aerodynamic simulation of the NASA CRM aircraft model with deployed high-lift devices.
Slides / Video
Overview of the code refactoring activities at the exaFOAM project, including Coupled implicit solution algorithms, Highly parallel linear solvers, Plug-in to external linear algebra solver, Parallel I/O with external packages, Plug-in to external ODE solver for fast chemistry calculation, Data management and compression for unsteady adjoint simulations, Improvement of parallel efficiency of the GGI/AMI implementation, and Parallel load balancing with mesh migration.
Slides
Microbenchmarks have been derived from the industrial applications as well as the HPC Grand Challenges, making suitable test cases for continuous assessment of the software components during software development. At least one microbenchmark was derived from each application, capturing and highlighting their computational bottlenecks at minimal computational cost. The microbenchmarks are available at the repository of the OpenFOAM HPC Technical Committee.
Slides / Video
Simulating multi-dimensional combustion with detailed kinetics requires solving a large number of ordinary differential equation (ODE) problems at each global time step of the fluid dynamic simulation. In many cases, the ODE integrations account for the bulk of the total wall-clock time for the simulation. The new library offloads the ODE integration with multi-level parallelization: the fluid problem is decomposed on the CPU via MPI, while the chemical problem is parallelized on the GPU. Clusters of cells are solved in parallel using CUDA blocks and the species in the cells are solved concurrently using CUDA threads. The speedup is appreciable, depending on the test case.
Slides / Video
A revised parallel I/O method is being developed, implemented and tested. Target is an I/O strategy, which will fully elaborate the performance of the underlying I/O system with parallel access to data independently from the number of MPI-processes and number of used cores, removing major scalability and usability bottlenecks. The new I/O is implemented natively in OpenFOAM based on ADIOS2. In contrast to the current I/O strategy, the number of files and mesh/field decomposition/reconstruction/redistribution steps are substantially reduced and the user can still efficiently make changes to already decomposed cases.
Slides / Video
The numerical simulation of profile extrusion of polymers has a clear industrial potential. The existing segregated solver for incompressible viscoelastic fluid flows with constitutive equations for the rheological behaviour of polymers, the viscoelasticFluidFoam solver from foam-extend, has been improved. Also a block-coupled approach based on the existing block-coupling infrastructure, consisting of blockmatrix assembly tools and block-coupling “dense-on-sparse” linear equation solvers, is under development.
Slides / Video
Sparse matrix-vector multiplication (SpMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Due to the relative importance of matrix-assembly compared to matrix solution, the data-structures for mesh handling and discretization are re-used in the core matrix operations, namely SpMV and various preconditioners.
Slides / Video
n gradient-based aerodynamic optimization, assisted by the adjoint approach, is the state-of-the-art for computing gradients of an objective function (e.g. lift/drag forces) w.r.t. the design variables parameterizing an aerodynamic shape (e.g. car, aircraft). Utilizing the adjoint approach in transient flow problems, like the ones related to high fidelity turbulence models (DES, LES) typically used in HPC environments, requires the solution of an additional set of PDEs, derived by the Navier-Stokes equations but solved backwards in time. This poses the problem of storing all flow field instances during the solution of the unsteady Navier-Stokes equations, to be then used in the solution of the adjoint PDEs. Compressing the primal solution time-series is extremely beneficial both memory-wise and cost-wise.
Slides / Video
A detailed performance analysis of different exaFOAM use cases has been conducted, with the aim of spotting inefficiencies at different levels: instruction, vectorization, memory access pattern, parallelization, and I/O. Firstly, the POP methodology has been used, which defines a set of efficiency metrics that help to identify the characteristics of the code leading to a performance inefficiency. Secondly, detailed profiles generated using Extrae and visualised with Paraver were analysed, showing in detail the perfomance bottlenecks.
Slides / Video
To analyse a code, instrumentation based techniques can be used, as sampling based measurement methods are currently not suitable for automated performance modelling due to their statistical nature. In addition instrumentation has the potential for higher measurement detail and lower total overhead. To support measurement in OpenFOAM, the process was substantially automated. Using the InstRO tool, based on Clang/LLVM, a human driven automatic instrumentation tool has been developed, which allows for a selective low-overhead instrumentation of OpenFOAM.
Slides / Video
Measuring the microbenchmarks performances on different types of architecture can be helpful for observe the effect of the system and how to take advantage of a specific architecture, highlighting the strong points and the vulnerability and how the use case characteristics depend upon the architecture. The measurements were realized at the E4 facilities on the Armida cluster, that provides heterogeneous CPU architectures for small-scale HPC software testing.
Slides