Materia

Contenido de XSL

Introducción al aprendizaje por refuerzo

Datos generales de la materia

Modalidad: Presencial
Idioma: Inglés

Descripción y contextualización de la asignatura

Reinforcement learning (RL) is a body of theory and techniques for optimal sequential decision making. In its basic setting, at each time, an agent selects an action, and as a result, it collects a reward and the system state evolves. The agent observes the new state and decides on the next action, with the objective of maximizing the total accumulated reward. Reinforcement learning has found numerous applications, ranging from online services (ad placement, recommendation systems), game playing (chess, Atari, Go etc.), control, robotics, etc. In this course we will first introduce the underlying mathematical framework (Markov decision processes) and its solution methods, including dynamic programming, Monte Carlo methods, and temporal-difference learning

Profesorado

Nombre	Institución	Categoría	Doctor/a	Perfil docente	Área	Email
AYESTA MORATE, URTZI	Universidad del País Vasco/Euskal Herriko Unibertsitatea	Visitante Ikerbaske	Doctor	No bilingüe	Ciencia de la Computación e Inteligencia Artificial	urtzi.ayesta@ehu.eus

Competencias

Denominación	Peso
Conocimiento de los principios teóricos del aprendizaje por refuerzo.	50.0 %
Desarrollar algoritmos de aprendizaje por refuerzo adaptados a problemas específicos.	50.0 %

Tipos de docencia

Tipo	Horas presenciales	Horas no presenciales	Horas totales
Magistral	15	0	15
P. Ordenador	15	45	60

Actividades formativas

Denominación	Horas	Porcentaje de presencialidad
Clases magistrales	15.0	100 %
Trabajo en grupo	45.0	0 %
Trabajos con equipos informáticos	15.0	100 %

Sistemas de evaluación

Denominación	Ponderación mínima	Ponderación máxima
Ensayo, trabajo individual y/o en grupo	25.0 %	50.0 %
Examen escrito	50.0 %	75.0 %

Resultados del aprendizaje de la asignatura

- Understand the basics of sequential decisión making.

- Formulate RL algorithms that can solve optimally a sequential decisión problem.

- Gain a mathematical understanding of convergence results of RL algorithms.

- Learn how RL can be combined with parametric function approxi-mation, including deep learning, to find good approximate solutions to real world complexity problems.

Temario

Introduction to Reinforcement Learning

Topics: Applications of RL, RL successes, RL vs. supervised learning, major components of RL, Learning and planning, prediction vs. control

Recap of Markov Processes

Topics: Markov Chains, Markov Reward Processes, Markov Decision Processes

Stochastic dynamic programming

Topics: Principle of optimality, dynamic programming, Bellman optimality equation, Value function, Iterative schemes to solve Bellman (Value Iteration, Policy iteration)

Prediction: How to learn the performance?

Topics: Monte Carlo, Temporal-Difference Learning (TD(0), TD(¿)))

Control: How to learn the optimal control?

Topics: State-action function, Online and Offline learning, Exact methods (Q- learning, and SARSA)

Convergence of learning algorithms

Topics: Convergence of random variables, Martingales, stochastic approximation, Convergence of TD(0), TD(¿) and Q-learning

Exploration vs. exploitation

Topics: Multi-armed bandits, optimality of index policies (Gittins), Regret, Optimality of logarithmic regret, UCB Algo-rithm

Approximate Solution Methods

Topics: Value function approximation, Stochastic gradient descent, approximation by feature representation, linear value function approximation (convergence), control with value function approximation, Action-value function ap-proximation, deep reinforcement learning, batch reinforcement learning, experience replay, Algorithms wit supra-human performance (AlphaGo, Atari games)

Bibliografía

Bibliografía básica

Sean Meyn, Feedback systems and reinforcement learning, 2020

Dimitri P. Bertsekas, Reinforcement learning and optimal control, 2019

M. L. Puterman, Markov Decision Processes. Wiley, 1994.

Richard S. Sutton and Andrew G. Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 2018.

V. Borkar, Stochastic approximation: a dynamical systems viewpoint, Tata institute of Fun- damental Research, 2008

Contenido de XSL

Sugerencias y solicitudes

Visualización del menú

Barra de búsqueda

Máster en Ingeniería Computacional y Sistemas Inteligentes

Materia

Contenido de XSL

Introducción al aprendizaje por refuerzo

Datos generales de la materia

Descripción y contextualización de la asignatura

Profesorado

Competencias

Tipos de docencia

Actividades formativas

Sistemas de evaluación

Resultados del aprendizaje de la asignatura

Temario

Bibliografía

Bibliografía básica

Contenido de XSL

Visualización del menú

Barra de búsqueda

Ruta de navegación

Materia

Contenido de XSL

Introducción al aprendizaje por refuerzo

Datos generales de la materia

Descripción y contextualización de la asignatura

Profesorado

Competencias

Tipos de docencia

Actividades formativas

Sistemas de evaluación

Resultados del aprendizaje de la asignatura

Temario

Bibliografía

Bibliografía básica

Contenido de XSL