Authors

* External authors

Venue

Date

Share

N-agent Ad Hoc Teamwork

Caroline Wang*

Arrasy Rahman*

Ishan Durugkar

Elad Liebman*

Peter Stone

* External authors

NeurIPS 2024

2024

Abstract

Current approaches to learning cooperative multi-agent behaviors assume relatively restrictive settings. In standard fully cooperative multi-agent reinforcement learning, the learning algorithm controls all agents in the scenario, while in ad hoc teamwork, the learning algorithm usually assumes control over only a single agent in the scenario. However, many cooperative settings in the real world are much less restrictive. For example, in an autonomous driving scenario, a company might train its cars with the same learning algorithm, yet once on the road, these cars must cooperate with cars from another company. Towards expanding the class of scenarios that cooperative learning methods may optimally address, we introduce -agent ad hoc teamwork (NAHT), where a set of autonomous agents must interact and cooperate with dynamically varying numbers and types of teammates. This paper formalizes the problem, and proposes the Policy Optimization with Agent Modelling (POAM) algorithm. POAM is a policy gradient, multi-agent reinforcement learning approach to the NAHT problem, that enables adaptation to diverse teammate behaviors by learning representations of teammate behaviors. Empirical evaluation on tasks from the multi-agent particle environment and StarCraft II shows that POAM improves cooperative task returns compared to baseline approaches, and enables out-of-distribution generalization to unseen teammates.

Related Publications

Dobby: A Conversational Service Robot Driven by GPT-4

RO-MAN, 2025
Carson Stark, Bohkyung Chun, Casey Charleston, Varsha Ravi, Luis Pabon, Surya Sunkari, Tarun Mohan, Peter Stone, Justin Hart*

This work introduces a robotics platform which embeds a conversational AI agent in an embodied system for natural language understanding and intelligent decision-making for service tasks; integrating task planning and human-like conversation. The agent is derived from a larg…

Disentangled Unsupervised Skill Discovery for Efficient Hierarchical Reinforcement Learning

NeurIPS, 2025
Jiaheng Hu*, Roberto Martin-Martin*, Peter Stone, Zizhao Wang*

A hallmark of intelligent agents is the ability to learn reusable skills purely from unsupervised interaction with the environment. However, existing unsupervised skill discovery methods often learn entangled skills where one skill variable simultaneously influences many ent…

SkiLD: Unsupervised Skill Discovery Guided by Factor Interactions

NeurIPS, 2025
Zizhao Wang*, Jiaheng Hu*, Roberto Martin-Martin*, Amy Zhang*, Scott Niekum*, Peter Stone, Caleb Chuck, Stephen Chen

Unsupervised skill discovery carries the promise that an intelligent agent can learn reusable skills through autonomous, reward-free environment interaction. Existing unsupervised skill discovery methods learn skills by encouraging distinguishable behaviors that cover divers…

JOIN US

Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.