This course will teach the fundamentals needed to utilize the ever-increasing power of the GPUs and HPC clusters.
The course will start with an architectural overview of modern HPC and GPU based heterogeneous architectures, focusing on its computing power versus data movement needs.
The course will cover classical parallel computing MPI and OpenMP programming and both a high level (pragma-based) GPU programming approach with OpenACC for a fast porting startup, and lower level approaches based on nVIDIA CUDA programming language for finer grained computational intensive tasks.
A particular attention will be given on performance tuning and techniques to overcome common data movement bottlenecks and patterns.
- Luca Ferraro (Senior Software Developer at CINECA)
- Sergio Orlandini (HPC Software Engineer at CINECA)
By the end of the course, participants will be able to:
- understand the strengths and weaknesses of GPUs as accelerators
- program GPU accelerated applications using both higher and lower level programming approaches
- overcome problems and bottlenecks regarding data movement between host and device memories
- make best use of independent execution queues for concurrent computing/data-movement operations
Researchers and programmers interested in porting scientific applications or use efficient post-process and data-analysis techniques in modern heterogeneous HPC architectures.
A basic knowledge of C or Fortran is mandatory. Programming and Linux or Unix. A basic knowledge of any parallel programming technique/paradigm is recommended.