Authors

* External authors

Venue

Date

Share

f-Policy Gradients: A General Framework for Goal-Conditioned RL using f-Divergences

Siddhant Agarwal*

Ishan Durugkar

Amy Zhang*

Peter Stone

* External authors

NeurIPS 2023

2023

Abstract

Goal-Conditioned RL problems provide sparse rewards where the agent receives a reward signal only when it has achieved the goal, making exploration a difficult problem. Several works augment this sparse reward with a learned dense reward function, but this can lead to suboptimality in exploration and misalignment of the task. Moreover, recent works have demonstrated that effective shaping rewards for a particular problem can depend on the underlying learning algorithm. Our work ($f$-PG or $f$-Policy Gradients) shows that minimizing f-divergence between the agent's state visitation distribution and the goal can give us an optimal policy. We derive gradients for various f-divergences to optimize this objective. This objective provides dense learning signals for exploration in sparse reward settings. We further show that entropy maximizing policy optimization for commonly used metric-based shaping rewards like L2 and temporal distance can be reduced to special cases of f-divergences, providing a common ground to study such metric-based shaping rewards. We compare $f$-Policy Gradients with standard policy gradients methods on a challenging gridworld as well as the Point Maze environments.

Related Publications

N-agent Ad Hoc Teamwork

NeurIPS, 2024
Caroline Wang*, Arrasy Rahman*, Ishan Durugkar, Elad Liebman*, Peter Stone

Current approaches to learning cooperative multi-agent behaviors assume relatively restrictive settings. In standard fully cooperative multi-agent reinforcement learning, the learning algorithm controls all agents in the scenario, while in ad hoc teamwork, the learning algor…

Discovering Creative Behaviors through DUPLEX: Diverse Universal Features for Policy Exploration

NeurIPS, 2024
Borja G. Leon*, Francesco Riccio, Kaushik Subramanian, Pete Wurman, Peter Stone

The ability to approach the same problem from different angles is a cornerstone of human intelligence that leads to robust solutions and effective adaptation to problem variations. In contrast, current RL methodologies tend to lead to policies that settle on a single solutio…

A Super-human Vision-based Reinforcement Learning Agent for Autonomous Racing in Gran Turismo

RLC, 2024
Miguel Vasco*, Takuma Seno, Kenta Kawamoto, Kaushik Subramanian, Pete Wurman, Peter Stone

Racing autonomous cars faster than the best human drivers has been a longstanding grand challenge for the fields of Artificial Intelligence and robotics. Recently, an end-to-end deep reinforcement learning agent met this challenge in a high-fidelity racing simulator, Gran Tu…

  • HOME
  • Publications
  • f-Policy Gradients: A General Framework for Goal-Conditioned RL using f-Divergences

JOIN US

Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.