Q-Learning and the Hose Tranport Application: Videos

A set of robots is attached to a hose modeled as a line segment between them. Agents are first trained using Q-Learning not to strech the hose above its nominal maximum length and not to collide with each other. The robot carrying the tip of the hose is desired to reach the goal, which is represented as a green dot.

Partially Constrained Models (2011)

In these videos, we show the difference between learning a task in a computationaly expensive environment (GEDS) and first trainning agents on a PCM and then testing learnt policies in the target task. These videos are intended as complementary material for the paper submitted to the Robotics and Autonomous Systems journal

Transfer Learning with Partially Constrained Models: application to reinforcement learning of linked multicomponent robot system control Borja Fernandez-Gauna, Jose Manuel Lopez-Guede, Manuel Graña

-First trial-goal example. Video: media:RAS_-_PCM_1c.avi

-Second trial-goal example. Video: media:RAS_-_PCM_2c.avi

Round-Robin Distributed Q-Learning (2011)

In this case, the state is fully observable but the agents do not explicitly coordinate. The reward signal is shared by both agents. Instead of using typical Q-Learning, we use our Round-Robin Cooperative Multi-Agent Q-Learning variation of the algorithm, which forces agents to take actions one by one. After 1000 episodes for each configuration, the system reached an optimal policy fo all configurations.

Modular Multi-Agent Reinforcement Learning approach to L-MCRS systems (2010-2011)

In this section, agents are only aware of their neighbors' positions and no coordination mechanishm is used. Results from our simulations with 6 physicaly-linked robots using a Modular Reinforcement Learning system.

Local goals

From an initial position of the hose, the agents must reach a final configuration (green). Each of the robots has its own local goal.

Succesful episodes

Failed episodes

Team goal

All agents share the reward signal. They all receive a positive reinforcement when the tip of the hose is carried to the goal.

Succesful episodes

Failed episodes

Consensus-based approach to L-MCRS systems

These are some examples of real life experiences on the hose transportation problem. Robot detection and control software is run on a PC. Red dots represent the references (where robots "should be") and green dots the position computed from the information obtained from the overhead camera (where "they are"). Commands are sent to robots using radio transceivers:

A) Non-Linked Robots

No physical links are used and robots perform relatively well. Due to communication errors, delays, servo inaccuracies and nature of PI controllers, robots oscillate around the path.

B) Linked Robots

Steering behaves worse as the physical link introduces some traction effects on the system. For the same reason, it takes longer for the robots to catch the references.

  • B.2 Max. tangential speed for last robot was limited (50%). References move full-speed. (media:2010.5.run5.avi)

The last robot is forced to move slower than the rest and, because of this, the robots aren't capable of catching the references. Error propagates across the system.

  • B.3 Max. tangential speed for last robot was limited (50%). References move at 75% speed.(media:2010.5.run6.avi)

The last robot is forced again to move at half-speed and references move at 75% speed, yet the robots aren't able to follow the path in an acceptable way.

  • B.4 Max. tangential speed for last robot was limited (50%). References move at 50% speed.(media:2010.5.run7.avi)

The last robot is running at half-speed and the references move at half-speed too, showing that if all the robots move faster or equally fast as the references, the overall system behavior is better, no matter the maximum speed differences between the robots. Near the end of the path, traction forces between robots are higher than the forces applied by the robots and they are not capable of steering correctly.

One interesting application of physically-linked multicomponent robotic systems is the fail-tolerance. In this run, last robot remains switched-off and the robots still follow the path acceptably good. The robot switched off makes following the path harder to the rest.

This time the references move slower allowing the robots to catch them faster. The last robot is switched off and that makes the rest behave worse.