INFO2049-1

Duration

30h Th

Number of credits

	Master of Science (MSc) in Data Science	5 crédits
	Master of Science (MSc) in Computer Science and Engineering	5 crédits
	Master of Science (MSc) in Computer Science and Engineering (double diplômation avec HEC)	5 crédits
	Master of Science (MSc) in Data Science and Engineering	5 crédits
	Master of Science (MSc) in Computer Science	5 crédits
	Master of Science (MSc) in Computer Science (double diplômation avec HEC)	5 crédits
	Master in business engineering (120 ECTS)	5 crédits
	Master in linguistics (120 ECTS) (Double diplomation)	5 crédits

Lecturer

Ashwin Ittoo

Language(s) of instruction

English language

Organisation and examination

Teaching in the first semester, review in January

Schedule

Schedule online

Units courses prerequisite and corequisite

Prerequisite or corequisite units are presented within each program

Learning unit contents

Objective
This course covers basic and advanced algorithms and techniques in natural language processing (NLP) and text mining/analytics.The algorithms covered will have a machine learning underpinning.
The topics will include those dealing with general machine learning (e.g. decision trees, Bayesian learning, Markov Chains, neural networks) as well as those specific to NLP (e.g. language modelling, neural language models)
Deep learning techniques for NLP will also be covered, including RNN, Seq2Sep, Transformer Models and RoBERTa (from Facebook Research)
This course has both a strong theoretical and practical components. The practicals will be done mostly using R and scikit-learn (Python). Participants can use other languages with which they feel more comfortable (e.g. C, C++, C#, Java)
Students will also be asked to read and discuss recent scientific articles in Deep Learning & Deep Learning for NLP.
Take this course only if you have very good mathematical and programming skills

Course Structure
1. Introduction

Introduction to machine learning
Decision tree classifiers

2. Vector Space Model and Information Retrieval

Representing words as vectors
Measuring text similarity (Levenshtein distance, Cosine Similarity)

3. Feature Selection

Tf-idf (Term Frequency-Inverse Document Frequency)
Chi-squared measure
Mutual information

4. Naïve-Bayes for Text Classification

Bayesian theory revision
Multinomial vs. Bernoulli Naïve-Bayes
Parameter estimation

5. Evaluating Models

Bootstrapping, cross-validation
Metrics: precision, recall, F-score
Metrics (Machine Translation): BLEU
Spearmann Rank correlation, Wilcoxon test (if time permits)

6. Language Models

Markov models
n-gram (tri-gram) models
Parameter estimation
Perplexity metric
Discounting methods and Katz Back-off

7. Revision: Artificial Neural Networks (ANN)

ANN architecture
Activation functions
Stochastic Gradient Descent

8. Neural Network Language Models (Neural language models)

Distributational semantics and distributed word representation
Word2Vec (see Distributed Word Representation, Mikolov et al., 2013)

9. Deep Learning for NLP

Introducing Recurrent Networks for NLP

10. Machine Translation (Statistical & Deep Learning)

Methods for Statistical Machine Translation
Seq2Seq Model for Machine Translation
Evaluation

11. Language Models (part ii)

Transformer Model
RoBERTa Model

Practial Sessions Course participants are expect to work on a practical project, which will count for ~30-40% of the final grade. These projects will be comprehensive in the sense that they will encompass many of the different aspects taught in the lectures/practicals. Sample project topics include text classification, opinion mining, machine translation, language generation. Projects will be executed in groups of 3 students.

Students are expected to have programming skills and implement the projects on their own.

Learning outcomes of the learning unit

Understand the underlying principles and algebraic formulations of machine learning models
Ability to apply these models to the task of information extraction from text and text classification
Synthesize various principles and algorithms introduced in the course and to develop a full-fledge text analytics application (as part of the course project)
Implement text analytics solutions to support an organization's business intelligence activities
Formulate a strategy based on the acquired text analytics skills to optimize the value of an organization
Ability to perform research on and understand advanced topics in the field and to be informed on recent developments to adapt easily to changing requirements
Appreciate how the algorithms studied could solve real-life managerial issues
Communicate appropriately about text analytics projects/applications to various stakeholders

Prerequisite knowledge and skills

It is very important for course participants to have a very good background in:

Calculus (e.g. partial derivatives, chain rule)
Vector/matrix algebra
Statistical methods (e.g. probabilities, regressions)
Programming
Some knowledge of mathematical optimization

Note that the above are essential for you to succeed in this course. We will assume that you are already well-versed in these topics.
Support will be offered to students

Lecture notes
Online references

Planned learning activities and teaching methods

The course carries 5 credits and therefore requires 150 hours of work (1 credit = 30 hours).
Theory lectures = 18-22 hours

Self-study for exam = approx. 70 hours
Practical lectures = 9-12 hours
Working on practical exercises and projects = approx. 80 hours
Total = 150 hours (5 credits)

Mode of delivery (face-to-face ; distance-learning)

Lectures
Practical (during lectures and as homework)

Assessment methods and criteria

Final written exam: 70%
Final practical project: 30%
(May be adjusted during the course)

Work placement(s)

Organizational remarks

Contacts

Ashwin Ittoo, ashwin.ittoo@uliege.be

Adaptation of teaching commitments following the COVID-19 pandemic for the May-June 2020 session

Teaching methods implemented : distance-learning

Assessment subjects

Assessment methods

Contacts

Adaptation of teaching commitments following the COVID-19 pandemic for the Aug-Sept 2020 session

Assessment subjects

Same as 1st session.

Assessment methods

Distance/Online.
Details will be communicated to concernend students well before the exams.

Contacts

ashwin.ittoo@uliege.be

Items online

Lecture Notes
Lecture Notes

Web and Text Analytics

Duration

Number of credits

Lecturer

Language(s) of instruction

Organisation and examination

Schedule

Units courses prerequisite and corequisite

Learning unit contents

Learning outcomes of the learning unit

Prerequisite knowledge and skills

Planned learning activities and teaching methods

Mode of delivery (face-to-face ; distance-learning)

Recommended or required readings

Assessment methods and criteria

Work placement(s)

Organizational remarks

Contacts

Adaptation of teaching commitments following the COVID-19 pandemic for the May-June 2020 session

Teaching methods implemented : distance-learning

Assessment subjects

Assessment methods

Contacts

Adaptation of teaching commitments following the COVID-19 pandemic for the Aug-Sept 2020 session

Assessment subjects

Assessment methods

Contacts

Items online