# Gaia

Modalitatea
Ikasgelakoa
Hizkuntza
Ingelesa

## Irakasgaiaren azalpena eta testuingurua

Reinforcement learning (RL) is a body of theory and techniques for optimal sequential decision making. In its basic setting, at each time, an agent selects an action, and as a result, it collects a reward and the system state evolves. The agent observes the new state and decides on the next action, with the objective of maximizing the total accumulated reward. Reinforcement learning has found numerous applications, ranging from online services (ad placement, recommendation systems), game playing (chess, Atari, Go etc.), control, robotics, etc. In this course we will first introduce the underlying mathematical framework (Markov decision processes) and its solution methods, including dynamic programming, Monte Carlo methods, and temporal-difference learning

## Gaitasunak

IzenaPisua
Conocimiento de los principios teóricos del aprendizaje por refuerzo.50.0 %

MotaIkasgelako orduakIkasgelaz kanpoko orduakOrduak guztira
Magistrala15015
Ordenagailuko p.154560

IzenaOrduakIkasgelako orduen ehunekoa
Eskola magistralak15.0100 %
Lanak ekipo informatikoekin15.0100 %
Talde-lana45.00 %

## Ebaluazio-sistemak

IzenaGutxieneko ponderazioaGehieneko ponderazioa
Banakako eta/edo taldeko lana, entsegua25.0 % 50.0 %
Idatzizko azterketa50.0 % 75.0 %

## Irakasgaia ikastean lortuko diren emaitzak

- Understand the basics of sequential decisión making.

- Formulate RL algorithms that can solve optimally a sequential decisión problem.

- Gain a mathematical understanding of convergence results of RL algorithms.

- Learn how RL can be combined with parametric function approxi-mation, including deep learning, to find good approximate solutions to real world complexity problems.

## Irakasgai-zerrenda

Introduction to Reinforcement Learning

Topics: Applications of RL, RL successes, RL vs. supervised learning, major components of RL, Learning and planning, prediction vs. control

Recap of Markov Processes

Topics: Markov Chains, Markov Reward Processes, Markov Decision Processes

Stochastic dynamic programming

Topics: Principle of optimality, dynamic programming, Bellman optimality equation, Value function, Iterative schemes to solve Bellman (Value Iteration, Policy iteration)

Prediction: How to learn the performance?

Topics: Monte Carlo, Temporal-Difference Learning (TD(0), TD(¿)))

Control: How to learn the optimal control?

Topics: State-action function, Online and Offline learning, Exact methods (Q- learning, and SARSA)

Convergence of learning algorithms

Topics: Convergence of random variables, Martingales, stochastic approximation, Convergence of TD(0), TD(¿) and Q-learning

Exploration vs. exploitation

Topics: Multi-armed bandits, optimality of index policies (Gittins), Regret, Optimality of logarithmic regret, UCB Algo-rithm

Approximate Solution Methods

Topics: Value function approximation, Stochastic gradient descent, approximation by feature representation, linear value function approximation (convergence), control with value function approximation, Action-value function ap-proximation, deep reinforcement learning, batch reinforcement learning, experience replay, Algorithms wit supra-human performance (AlphaGo, Atari games)

## Bibliografia

#### Oinarrizko bibliografia

Sean Meyn, Feedback systems and reinforcement learning, 2020

Dimitri P. Bertsekas, Reinforcement learning and optimal control, 2019

M. L. Puterman, Markov Decision Processes. Wiley, 1994.

Richard S. Sutton and Andrew G. Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 2018.

V. Borkar, Stochastic approximation: a dynamical systems viewpoint, Tata institute of Fun- damental Research, 2008