Introducción al aprendizaje por refuerzo

Descripción y contextualización de la asignatura

Reinforcement learning (RL) is a body of theory and techniques for optimal sequential decision making. In its basic setting, at each time, an agent selects an action, and as a result, it collects a reward and the system state evolves. The agent observes the new state and decides on the next action, with the objective of maximizing the total accumulated reward. Reinforcement learning has found numerous applications, ranging from online services (ad placement, recommendation systems), game playing (chess, Atari, Go etc.), control, robotics, etc. In this course we will first introduce the underlying mathematical framework (Markov decision processes) and its solution methods, including dynamic programming, Monte Carlo methods, and temporal-difference learning


AYESTA MORATE, URTZIUniversidad del País Vasco/Euskal Herriko UnibertsitateaVisitante IkerbaskeDoctorNo bilingüeCiencia de la Computación e Inteligencia


Conocimiento de los principios teóricos del aprendizaje por refuerzo.50.0 %
Desarrollar algoritmos de aprendizaje por refuerzo adaptados a problemas específicos.50.0 %

Tipos de docencia

TipoHoras presencialesHoras no presencialesHoras totales
P. Ordenador154560

Actividades formativas

DenominaciónHorasPorcentaje de presencialidad
Clases magistrales15.0100 %
Trabajo en grupo45.00 %
Trabajos con equipos informáticos15.0100 %

Sistemas de evaluación

DenominaciónPonderación mínimaPonderación máxima
Ensayo, trabajo individual y/o en grupo25.0 % 50.0 %
Examen escrito50.0 % 75.0 %

Resultados del aprendizaje de la asignatura

- Understand the basics of sequential decisión making.

- Formulate RL algorithms that can solve optimally a sequential decisión problem.

- Gain a mathematical understanding of convergence results of RL algorithms.

- Learn how RL can be combined with parametric function approxi-mation, including deep learning, to find good approximate solutions to real world complexity problems.


Introduction to Reinforcement Learning

Topics: Applications of RL, RL successes, RL vs. supervised learning, major components of RL, Learning and planning, prediction vs. control

Recap of Markov Processes

Topics: Markov Chains, Markov Reward Processes, Markov Decision Processes

Stochastic dynamic programming

Topics: Principle of optimality, dynamic programming, Bellman optimality equation, Value function, Iterative schemes to solve Bellman (Value Iteration, Policy iteration)

Prediction: How to learn the performance?

Topics: Monte Carlo, Temporal-Difference Learning (TD(0), TD(¿)))

Control: How to learn the optimal control?

Topics: State-action function, Online and Offline learning, Exact methods (Q- learning, and SARSA)

Convergence of learning algorithms

Topics: Convergence of random variables, Martingales, stochastic approximation, Convergence of TD(0), TD(¿) and Q-learning

Exploration vs. exploitation

Topics: Multi-armed bandits, optimality of index policies (Gittins), Regret, Optimality of logarithmic regret, UCB Algo-rithm

Approximate Solution Methods

Topics: Value function approximation, Stochastic gradient descent, approximation by feature representation, linear value function approximation (convergence), control with value function approximation, Action-value function ap-proximation, deep reinforcement learning, batch reinforcement learning, experience replay, Algorithms wit supra-human performance (AlphaGo, Atari games)


Bibliografía básica

Sean Meyn, Feedback systems and reinforcement learning, 2020

Dimitri P. Bertsekas, Reinforcement learning and optimal control, 2019

M. L. Puterman, Markov Decision Processes. Wiley, 1994.

Richard S. Sutton and Andrew G. Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 2018.

V. Borkar, Stochastic approximation: a dynamical systems viewpoint, Tata institute of Fun- damental Research, 2008

