**Aim of the course**

The course aims to provide a thorough knowledge of designing and implementing parallel algorithms for problems in the area of scientific computing.

Learning goals:

After completion of the course, the student is able to:

- design a parallel algorithm for a problem from the area of scientific computing;
- analyse the cost of the algorithm in terms of computing time, communication time and synchronisation time;
- write a parallel program based on an algorithm that solves a problem;
- write a report on the algorithm, its analysis, and numerical experiments performed, in a concise and readable form.

Today, parallel computers are appearing on our desktops. The advent of dual-core and quad-core computers and the expected increase in the number of cores in the coming years, inevitably will cause a major change in our approach to software, such as the mathematical software we use in scientific computations.

Parallel computers drastically increase our computational capabilities and thus enable us to model more realistically in many application areas. To make efficient use of parallel computers, it is necessary to reorganize the structure of our computational methods. In particular, attention must be paid to the division of work among the different processors solving a problem in parallel and to the communication between them. Suitable parallel algorithms and systems software are needed to realize the capabilities of parallel computers.

We will discuss extensively the most recent developments in the area of parallel computers, ranging from multi-core desktop PCs to clusters of PCs connected by switching devices, to massively parallel computers with distributed memory such as our national supercomputer Cartesius at surfSARA in Amsterdam.

The following subjects are treated:

- Types of existing parallel computers;
- Principles of parallel computation: distributing the work evenly and avoiding communication;
- The Bulk Synchronous Parallel (BSP) model as an idealized model of the parallel computer;
- Use of BSPlib software as the basis for architecture-independent programs;
- Parallel algorithms for the following problems: prime number sieving, LU decomposition, Fast Fourier Transform, sparse matrix vector multiplication, iterative solution of linear systems, graph matching;
- Analysis of the computation. communication and synchronization time requirements of these algorithms by the BSP model;
- Hands-on experience in the laboratory class, using the Cartesius supercomputer.

**Prerequisites**

Introductory course in linear algebra. Some knowledge of algorithms.

Good knowledge of a modern programming language such as C, C++, Java, or Python.

Basic knowledge of C is helpful, as we will use this language in class.

For a tutorial in C, if you come from another programming language, see Appendix A (pages 345-364) of "21st Century C: C Tips from the New School, 2nd Edition", by Ben Klemens, O'Reilly 2014.

- Docent: Rob Bisseling