Overview

How Agents Can Learn In Environments With No Rewards

What you'll learn:

How to Code A3C Agents
How to Do Parallel Processing in Python
How to Implement Deep Reinforcement Learning Papers
How to Code the Intrinsic Curiosity Module

If reinforcement learning is to serve as a viable path to artificial general intelligence, it must learn to cope with environments with sparse or totally absent rewards. Most real life systems provided rewards that only occur after many time steps, leaving the agent with little information to build a successful policy on. Curiosity based reinforcement learning solves this problem by giving the agent an innate sense of curiosity about its world, enabling it to explore and learn successful policies for navigating the world.

In this advanced course on deep reinforcement learning, motivated students will learn how to implement cutting edge artificial intelligence research papers from scratch. This is a fast paced course for those that are experienced in coding up actor critic agents on their own. We'll code up two papers in this course, using the popular PyTorch framework.

The first paper covers asynchronous methods for deep reinforcement learning; also known as the popular asynchronous advantage actor critic algorithm (A3C). Here students will discover a new framework for learning that doesn't require a GPU. We will learn how to implement multithreading in Python and use that to train multiple actor critic agents in parallel. We will go beyond the basic implementation from the paper and implement a recent improvement to reinforcement learning known as generalized advantage estimation. We will test our agents in the Pong environment from the Open AIGym's Atari library, and achieve nearly world class performance in just a few hours.

From there, we move on to the heart of the course:learning in environments with sparse or totally absent rewards. This new paradigm leverages the agent's curiosity about the environment as an intrinsic reward that motivates the agent to explore and learn generalizable skills. We'll implement the intrinsic curiosity module (ICM), which is a bolt-on module for any deep reinforcement learning algorithm. We will train and test our agent in an maze like environment that only yields rewards when the agent reaches the objective. Aclear performance gain over the vanilla A3C algorithm will be demonstrated, conclusively showing the power of curiosity driven deep reinforcement learning.

Please keep in mind this is a fast paced course for motivated and advanced students. There will be only a very brief review of the fundamental concepts of reinforcement learning and actor critic methods, and from there we will jump right into reading and implementing papers.

The beauty of both the ICM and asynchronous methods is that these paradigms can be applied to nearly any other reinforcement learning algorithm. Both are highly adaptable and can be plugged in with little modification to algorithms like proximal policy optimization, soft actor critic, or deep Q learning.

Students will learn how to: