Playing hard exploration games by watching YouTube

Jun 1, 2018

Playing hard exploration games by watching YouTube

Aytar, Yusuf;
Pfaff, Tobias;
Budden, David;
Paine, Tom Le;
Wang, Ziyu;
de Freitas, Nando;

Abstract: Deep reinforcement learning methods traditionally struggle with tasks where environment rewards are particularly sparse. One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator. However, these demonstrations are typically collected under artificial conditions, i.e. with access to the agent's exact environment setup and the demonstrator's action and reward trajectories. Here we propose a two-stage method that overcomes these limitations by relying on noisy, unaligned footage without access to such data. First, we learn to map unaligned videos from multiple sources to a common representation using self-supervised objectives constructed over both time and modality (i.e. vision and sound). Second, we embed a single YouTube video in this representation to construct a reward function that encourages an agent to imitate human gameplay. This method of one-shot imitation allows our agent to convincingly exceed human-level performance on the infamously hard exploration games Montezuma's Revenge, Pitfall! and Private Eye for the first time, even if the agent is not presented with any environment rewards.

https://arxiv.org/abs/1805.11592v1

← Back to all articles Quick Navigation: Next:[ j ] – Prev:[ k ] – List:[ l ]

Brain Networks Laboratory (Choe Lab)

Playing hard exploration games by watching YouTube

Playing hard exploration games by watching YouTube