Prerequisites
Introductory course in linear algebra. Some knowledge of algorithms.

Knowledge of a modern programming language such as C, C++, Fortran, Python, or Java. Basic knowledge of C is helpful, as we will use this language in class, but we will organise the course in such a way that there will be time to adapt to it for those coming from another language. For those with little or no experience with C, this course is an opportunity to learn C, which is is very popular in high performance scientific and data-intensive computing. Some experience with the Linux operating system is helpful.

Aim of the course
The course aims to provide a thorough knowledge of designing and implementing parallel algorithms for problems in the area of scientific computing and data analysis.

Learning goals

After completing the course, you should be able to:

- design a parallel algorithm for a problem from the area of scientific computing or data analysis;
- analyse the cost of the algorithm using simple models of a parallel machine
- write a parallel program based on an algorithm that solves a problem;
- write a report on the algorithm, its analysis, and numerical experiments performed, in a concise and readable form;
- present the results in an oral presentation, in a clear manner, highlighting the most important ideas.

For decades, computers became faster due to the decreasing transistor size. Unfortunately, this "free lunch" is over: already since many years now, increase in computational performance is achieved solely by increasing the number of processors in computing systems.
Examples are laptops and smartphones (4-8 cores), compute servers (12-128 cores), and clusters of such servers (many thousand cores). GPUs, popular in data analysis and machine learning, already have thousands of simple processors on a single chip.
This trend is expected to continue, and causes a major change in our approach to software, such as the mathematical software we use in scientific computing,
data analysis and machine learning.

Parallel computers drastically increase our computational capabilities and thus enable us to model more realistically in many application areas. To make efficient use of parallel computers, it is necessary to reorganize the structure of our computational methods. In particular, attention must be paid to the division of work among the different processors solving a problem in parallel, and to the communication between them. Suitable parallel algorithms and systems software are needed to realize the capabilities of parallel computers.

The following subjects are treated:

- Types of existing parallel computers;
- Principles of parallel computation: distributing the work evenly and avoiding communication;
- The Bulk Synchronous Parallel (BSP) and Roofline models as idealized models of the parallel computer;
- Use of BSPlib software as the basis for architecture-independent programs;
- Parallel algorithms for the following problems: inner product computation, sorting, prime number sieving, LU decomposition, sparse matrix-vector multiplication, iterative solution of linear systems, graph matching;
- Model-based analysis of the computation, communication and synchronization time requirements of these algorithms;
- Hands-on experience in the laboratory class, using the DelftBlue supercomputer at TU Delft.

Lecturers
- Martin van Gijzen (TU Delft)
- Jonas Thies (TU Delft)