2017-2018 / INFO8002-1

Large-scale database systems


25h Th, 10h Pr, 45h Proj.

Number of credits

 Master in data science (120 ECTS)5 crédits 
 Master of science in computer science and engineering (120 ECTS)5 crédits 
 Master in data science and engineering (120 ECTS)5 crédits 
 Master in computer science (120 ECTS)5 crédits 


Gilles Louppe

Language(s) of instruction

English language

Organisation and examination

Teaching in the first semester, review in January

Units courses prerequisite and corequisite

Prerequisite or corequisite units are presented within each program

Learning unit contents

Distributed systems have become ubiquitous in modern large-scale computer systems, such as the Internet, cloud computing centers or networks of connected objects. They are of primary importance for scalability and reliability reasons.
However, distributed systems remain notoriously difficult to build because they need to scale to hundreds or thousands of machines, they must be tolerant to crashes, they have to cope with concurrent execution and they need to ensure consistency of the data they store. 
In this context, the course will cover concepts in a bottom-up fashion. We will first cover the foundational abstractions that are the core of distributed systems, including basic abstractions and system assumptions, reliable broadcast, shared memory and consensus. We will then study distributed computing systems that are built on top of those components, including MapReduce and computational graph systems (Spark; Tensorflow). Similarly, we will study distributed storage systems, including distributed file systems, distributed key-value stores and block chains. 

Learning outcomes of the learning unit

At the end of the course, the student will have understood the core building blocks of reliable distributed systems. He/she will also have acquainted with industrial distributed systems and their inner workings. Finally, he/she will have developed a critical thinking regarding the benefits and limitations of these systems. 

Prerequisite knowledge and skills

Programming experience. Basic knowledge in computer networks.

Planned learning activities and teaching methods

- Theoretical lectures, exercise sessions and programming tutorials. Invited speakers on specialized topics. 
- A programming project whose goal will be to implement a simple distributed system.
- An exploratory analysis project whose goal will be to experience with a distributed computing framework (Spark).

Mode of delivery (face-to-face ; distance-learning)

Lectures will taught face-to-face. Projects will be carried out remotely.

Recommended or required readings

Slides will be made publicly available on GitHub during the semester.
Part of the course will be based on "Introduction to Reliable and Secure Distributed Programming", Christian Cachin, Rachid Guerraoui, Luis Rodrigues, Springer. This book is recommended.

Assessment methods and criteria

- Oral exam (50%)
- Programming project 1 (25%)
- Programming project 2 (25%)
Projects are mandatory for presenting the exam. 

Work placement(s)

Organizational remarks

The website for the course is https://github.com/glouppe/info8002-large-scale-database-systems


Gilles Louppe (g.louppe@ulg.ac.be), Joeri Hermans (joeri.hermans@doct.ulg.ac.be, teaching assistant)