Gaia

XSLaren edukia

Errefortzuzko Ikaskuntzarako Sarrera

Gaiari buruzko datu orokorrak

Modalitatea: Ikasgelakoa
Hizkuntza: Ingelesa

Irakasgaiaren azalpena eta testuingurua

Reinforcement learning (RL) is a body of theory and techniques for optimal sequential decision making. In its basic setting, at each time, an agent selects an action, and as a result, it collects a reward and the system state evolves. The agent observes the new state and decides on the next action, with the objective of maximizing the total accumulated reward. Reinforcement learning has found numerous applications, ranging from online services (ad placement, recommendation systems), game playing (chess, Atari, Go etc.), control, robotics, etc. In this course we will first introduce the underlying mathematical framework (Markov decision processes) and its solution methods, including dynamic programming, Monte Carlo methods, and temporal-difference learning

Irakasleak

Izena	Erakundea	Kategoria	Doktorea	Irakaskuntza-profila	Arloa	Helbide elektronikoa
AYESTA MORATE, URTZI	Euskal Herriko Unibertsitatea	Ikerbaske Bisitaria	Doktorea	Elebakarra	Konputazio Zientzia eta Adimen Artifiziala	urtzi.ayesta@ehu.eus

Gaitasunak

Izena	Pisua
Conocimiento de los principios teóricos del aprendizaje por refuerzo.	50.0 %
Desarrollar algoritmos de aprendizaje por refuerzo adaptados a problemas específicos.	50.0 %

Irakaskuntza motak

Mota	Ikasgelako orduak	Ikasgelaz kanpoko orduak	Orduak guztira
Magistrala	15	0	15
Ordenagailuko p.	15	45	60

Irakaskuntza motak

Izena	Orduak	Ikasgelako orduen ehunekoa
Eskola magistralak	15.0	100 %
Lanak ekipo informatikoekin	15.0	100 %
Talde-lana	45.0	0 %

Ebaluazio-sistemak

Izena	Gutxieneko ponderazioa	Gehieneko ponderazioa
Banakako eta/edo taldeko lana, entsegua	25.0 %	50.0 %
Idatzizko azterketa	50.0 %	75.0 %

Irakasgaia ikastean lortuko diren emaitzak

- Understand the basics of sequential decisión making.

- Formulate RL algorithms that can solve optimally a sequential decisión problem.

- Gain a mathematical understanding of convergence results of RL algorithms.

- Learn how RL can be combined with parametric function approxi-mation, including deep learning, to find good approximate solutions to real world complexity problems.

Irakasgai-zerrenda

Introduction to Reinforcement Learning

Topics: Applications of RL, RL successes, RL vs. supervised learning, major components of RL, Learning and planning, prediction vs. control

Recap of Markov Processes

Topics: Markov Chains, Markov Reward Processes, Markov Decision Processes

Stochastic dynamic programming

Topics: Principle of optimality, dynamic programming, Bellman optimality equation, Value function, Iterative schemes to solve Bellman (Value Iteration, Policy iteration)

Prediction: How to learn the performance?

Topics: Monte Carlo, Temporal-Difference Learning (TD(0), TD(¿)))

Control: How to learn the optimal control?

Topics: State-action function, Online and Offline learning, Exact methods (Q- learning, and SARSA)

Convergence of learning algorithms

Topics: Convergence of random variables, Martingales, stochastic approximation, Convergence of TD(0), TD(¿) and Q-learning

Exploration vs. exploitation

Topics: Multi-armed bandits, optimality of index policies (Gittins), Regret, Optimality of logarithmic regret, UCB Algo-rithm

Approximate Solution Methods

Topics: Value function approximation, Stochastic gradient descent, approximation by feature representation, linear value function approximation (convergence), control with value function approximation, Action-value function ap-proximation, deep reinforcement learning, batch reinforcement learning, experience replay, Algorithms wit supra-human performance (AlphaGo, Atari games)

Bibliografia

Oinarrizko bibliografia

Sean Meyn, Feedback systems and reinforcement learning, 2020

Dimitri P. Bertsekas, Reinforcement learning and optimal control, 2019

M. L. Puterman, Markov Decision Processes. Wiley, 1994.

Richard S. Sutton and Andrew G. Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 2018.

V. Borkar, Stochastic approximation: a dynamical systems viewpoint, Tata institute of Fun- damental Research, 2008

XSLaren edukia

Iradokizunak eta eskaerak

Menu Display

Search Bar

Konputazio Ingeniaritza eta Sistema Adimentsuak Masterra

Gaia

XSLaren edukia

Errefortzuzko Ikaskuntzarako Sarrera

Gaiari buruzko datu orokorrak

Irakasgaiaren azalpena eta testuingurua

Irakasleak

Gaitasunak

Irakaskuntza motak

Irakaskuntza motak

Ebaluazio-sistemak

Irakasgaia ikastean lortuko diren emaitzak

Irakasgai-zerrenda

Bibliografia

Oinarrizko bibliografia

XSLaren edukia

Menu Display

Search Bar

Breadcrumb

Gaia

XSLaren edukia

Errefortzuzko Ikaskuntzarako Sarrera

Gaiari buruzko datu orokorrak

Irakasgaiaren azalpena eta testuingurua

Irakasleak

Gaitasunak

Irakaskuntza motak

Irakaskuntza motak

Ebaluazio-sistemak

Irakasgaia ikastean lortuko diren emaitzak

Irakasgai-zerrenda

Bibliografia

Oinarrizko bibliografia

XSLaren edukia