* External authors




DM2: Distributed Multi-Agent Reinforcement Learning via Distribution Matching

Caroline Wang*

Ishan Durugkar*

Elad Liebman*

Peter Stone

* External authors

AAAI 2023



Current approaches to multi-agent cooperation rely heavily on centralized mechanisms or explicit communication protocols to ensure convergence. This paper studies the problem of distributed multi-agent learning without resorting to centralized components or explicit communication. It examines the use of distribution matching to facilitate the coordination of independent agents. In the proposed scheme, each agent independently minimizes the distribution mismatch to the corresponding component of a target visitation distribution. The theoretical analysis shows that under certain conditions, each agent minimizing its individual distribution mismatch allows the convergence to the joint policy that generated the target distribution. Further, if the target distribution is from a joint policy that optimizes a cooperative task, the optimal policy for a combination of this task reward and the distribution matching reward is the same joint policy. This insight is used to formulate a practical algorithm (DM2), in which each individual agent matches a target distribution derived from concurrently sampled trajectories from a joint expert policy. Experimental validation on the StarCraft domain shows that combining (1) a task reward, and (2) a distribution matching reward for expert demonstrations for the same task, allows agents to outperform a naive distributed baseline. Additional experiments probe the conditions under which expert demonstrations need to be sampled to obtain the learning benefits.

Related Publications

A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems

Neural Networks, 2023
Megan M. Baker*, Alexander New*, Mario Aguilar-Simon*, Ziad Al-Halah*, Sébastien M. R. Arnold*, Ese Ben-Iwhiwhu*, Andrew P. Brna*, Ethan Brooks*, Ryan C. Brown*, Zachary Daniels*, Anurag Daram*, Fabien Delattre*, Ryan Dellana*, Eric Eaton*, Haotian Fu*, Kristen Grauman*, Jesse Hostetler*, Shariq Iqbal*, Cassandra Kent*, Nicholas Ketz*, Soheil Kolouri*, George Konidaris*, Dhireesha Kudithipudi*, Seungwon Lee*, Michael L. Littman*, Sandeep Madireddy*, Jorge A. Mendez*, Eric Q. Nguyen*, Christine D. Piatko*, Praveen K. Pilly*, Aswin Raghavan*, Abrar Rahman*, Santhosh Kumar Ramakrishnan*, Neale Ratzlaff*, Andrea Soltoggio*, Peter Stone, Indranil Sur*, Zhipeng Tang*, Saket Tiwari*, Kyle Vedder*, Felix Wang*, Zifan Xu*, Angel Yanguas-Gil*, Harel Yedidsion*, Shangqun Yu*, Gautam K. Vallabha*

Despite the advancement of machine learning techniques in recent years, state-of-the-art systems lack robustness to “real world” events, where the input distributions and tasks encountered by the deployed systems will not be limited to the original training context, and syst…

Reward (Mis)design for autonomous driving

Artificial Intelligence, 2023
W. Bradley Knox*, Alessandro Allievi*, Holger Banzhaf*, Felix Schmitt*, Peter Stone

This article considers the problem of diagnosing certain common errors in reward design. Its insights are also applicable to the design of cost functions and performance metrics more generally. To diagnose common errors, we develop 8 simple sanity checks for identifying flaw…

Metric Residual Networks for Sample Efficient Goal-Conditioned Reinforcement Learning

AAAI, 2023
Bo Liu*, Yihao Feng*, Qiang Liu*, Peter Stone

Goal-conditioned reinforcement learning (GCRL) has a wide range of potential real-world applications, including manipulation and navigation problems in robotics. Especially in such robotics tasks, sample efficiency is of the utmost importance for GCRL since, by default, the …

  • HOME
  • Publications
  • DM2: Distributed Multi-Agent Reinforcement Learning via Distribution Matching


Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.