**Aim of the course**

The course aims to provide a thorough knowledge of designing and implementing parallel algorithms for problems in the area of scientific computing and big data.

Learning goals:

After completion of the course, the student is able to:

- design a parallel algorithm for a problem from the area of scientific computing or big data;

- analyse the cost of the algorithm in terms of computing time, communication time and synchronisation time;

- write a parallel program based on an algorithm that solves a problem;

- write a report on the algorithm, its analysis, and numerical experiments performed, in a concise and readable form;

- present the results in an oral presentation, in a clear manner, highlighting the most important ideas.

Today, parallel computers are appearing on our desktops. The advent of dualcore, quadcore, and octacore computers and the expected increase in the number of cores in the coming years causes a major change in our approach to software, such as the mathematical software we use in scientific computations and in the emerging area of big data computations.

Parallel computers drastically increase our computational capabilities and thus enable us to model more realistically in many application areas. To make efficient use of parallel computers, it is necessary to reorganize the structure of our computational methods. In particular, attention must be paid to the division of work among the different processors solving a problem in parallel and to the communication between them. Suitable parallel algorithms and systems software are needed to realize the capabilities of parallel computers.

We will discuss extensively the most recent developments in the area of parallel computers, ranging from multicore desktop PCs to clusters of PCs connected by switching devices, to massively parallel computers with distributed memory such as our national supercomputer Cartesius at SURFsara in Amsterdam.

The following subjects are treated:

- Types of existing parallel computers;

- Principles of parallel computation: distributing the work evenly and avoiding communication;

- The Bulk Synchronous Parallel (BSP) model as an idealized model of the parallel computer;

- Use of BSPlib software as the basis for architecture-independent programs;

- Parallel algorithms for the following problems: inner product computation, sorting, prime number sieving, Fast Fourier Transform, sparse matrix-vector multiplication, iterative solution of linear systems, graph matching;

- Analysis of the computation. communication and synchronization time requirements of these algorithms by the BSP model;

- Hands-on experience in the laboratory class, using the Cartesius supercomputer.

**Prerequisites**

Introductory course in linear algebra. Some knowledge of algorithms. Good knowledge of a modern programming language such as C, C++, C#, Python, or Java. Basic knowledge of C is helpful, as we will use this language in class, but we will organise the course in such a way that there will be time to adapt to it for those coming from another language.

**Rules about Homework / Exam**

The examination is in the form of an initial assignment (20%), a final assignment (40%), a presentation on the final assignment (20%), and a midterm exam (20%). The two assignments are carried out in the exercise/laboratory class and at home.

A written report must be returned for each assignment before the deadline. Students can work individually or in pairs (but not in larger teams) on the computer programs and can hand in a single report, provided each did a fair share of the work and can account for that. Each participant has to give an individual presentation on the chosen topic of the final assignment.

Presentations will be scheduled on 6, 13, and 20 December 2017.

The midterm exam must be passed with a grade of 5.5 (on a scale of 10).

- Docent: Rob Bisseling