25h Th, 10h Pr, 45h Proj.
Number of credits
Language(s) of instruction
Organisation and examination
Teaching in the second semester
Units courses prerequisite and corequisite
Prerequisite or corequisite units are presented within each program
Learning unit contents
There are numerous decision-making problems that can be formalised as problems for which one needs to maximize a numerical reward (or equivalently minimize a cost) when playing with an environment which is stochastic or (partially) unknown, exhibits little structure (e.g., it is not linear/convex), has a sequential nature (e.g., a sequence of decisions needs to be taken to reach an objective) and/or is adversarial (e.g., an opponent takes its decisions so as to minimize your payoff as it is the cas for example when you play poker).
Typical examples of such problems are:
- The design of artificial intelligences able to learn to play computer games,
- The placement of advertisements on webpages to maximize the number of clicks,
- Controlling a rocket so as to safely reach a target with minimum fuel costs,
- The synthesis of winning strategies for playing with the stock market,
- The design of artificial intelligences for autonomous robots,
- The design of clinical experiences.
Learning outcomes of the learning unit
At the end of the class the student should be able (i) to be familiar with a broad class of techniques for solving optimal control problems (ii) to use these techniques for solving optimal control problems and to understand their main characteristics (iii) to have the ability to read and understand a significant amount of the scientific papers dedicated to this field of research and, in particular, those that relate to the reinforcement learning based approaches (also known as sampling based approaches) for solving optimal sequential decision making problems.
Among the different techniques that will be covered by this class, we can mention:
a. Dynamic programming and policy search techniques for Markov Decision Processes (MDPs)
b. Reinforcement learning techniques for MDPs
c. Techniques for solving the Exploration/Exploitation tradeoff, with a special focus on those that apply to multi-armed bandit problems.
d. Monte-Carlo Tree Search techniques for single-player and multi-player environments.
e. Multi-stage stochastic programming techniques for problems with large action spaces.
Prerequisite knowledge and skills
Basic knowledge in system theory, statistics, optimisation and machine learnng.
Good coding skills are required.
Planned learning activities and teaching methods
The classes will include different parts: theoretical courses, analyzes of scientific articles and exercises. The theoretical material will be taught mainly through inverse teaching.
Students will also have to work throughout the year on projects designed to implement the methodologies learned during the year on fairly simple examples.
Mode of delivery (face-to-face ; distance-learning)
Recommended or required readings
The teaching material will be accessible on the class website, see: http://blogs.ulg.ac.be/damien-ernst/teaching/
Assessment methods and criteria
The evaluation consists of two parts: a continuous assessment during the year which will count for 50% of the points and an oral examination at the end of the year.
Possibility for motivated students to do a research internship in this exciting field of artificial intelligence.