Authors

* External authors

Venue

Date

Share

N-agent Ad Hoc Teamwork

Caroline Wang*

Arrasy Rahman*

Ishan Durugkar

Elad Liebman*

Peter Stone

* External authors

NeurIPS 2024

2024

Abstract

Current approaches to learning cooperative multi-agent behaviors assume relatively restrictive settings. In standard fully cooperative multi-agent reinforcement learning, the learning algorithm controls all agents in the scenario, while in ad hoc teamwork, the learning algorithm usually assumes control over only a single agent in the scenario. However, many cooperative settings in the real world are much less restrictive. For example, in an autonomous driving scenario, a company might train its cars with the same learning algorithm, yet once on the road, these cars must cooperate with cars from another company. Towards expanding the class of scenarios that cooperative learning methods may optimally address, we introduce -agent ad hoc teamwork (NAHT), where a set of autonomous agents must interact and cooperate with dynamically varying numbers and types of teammates. This paper formalizes the problem, and proposes the Policy Optimization with Agent Modelling (POAM) algorithm. POAM is a policy gradient, multi-agent reinforcement learning approach to the NAHT problem, that enables adaptation to diverse teammate behaviors by learning representations of teammate behaviors. Empirical evaluation on tasks from the multi-agent particle environment and StarCraft II shows that POAM improves cooperative task returns compared to baseline approaches, and enables out-of-distribution generalization to unseen teammates.

Related Publications

Discovering Creative Behaviors through DUPLEX: Diverse Universal Features for Policy Exploration

NeurIPS, 2024
Borja G. Leon*, Francesco Riccio, Kaushik Subramanian, Pete Wurman, Peter Stone

The ability to approach the same problem from different angles is a cornerstone of human intelligence that leads to robust solutions and effective adaptation to problem variations. In contrast, current RL methodologies tend to lead to policies that settle on a single solutio…

A Super-human Vision-based Reinforcement Learning Agent for Autonomous Racing in Gran Turismo

RLC, 2024
Miguel Vasco*, Takuma Seno, Kenta Kawamoto, Kaushik Subramanian, Pete Wurman, Peter Stone

Racing autonomous cars faster than the best human drivers has been a longstanding grand challenge for the fields of Artificial Intelligence and robotics. Recently, an end-to-end deep reinforcement learning agent met this challenge in a high-fidelity racing simulator, Gran Tu…

Wait That Feels Familiar: Learning to Extrapolate Human Preferences for Preference-Aligned Path Planning.

ICRA, 2024
Haresh Karnan*, Elvin Yang*, Garrett Warnell*, Joydeep Biswas*, Peter Stone

Autonomous mobility tasks such as lastmile delivery require reasoning about operator indicated preferences over terrains on which the robot should navigate to ensure both robot safety and mission success. However, coping with out of distribution data from novel terrains or a…

JOIN US

Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.