Peter Stone – Sony AI

Profile

Peter is the Chief Scientist of Sony AI. He is also the founder and director of the Learning Agents Research Group (LARG) within the Artificial Intelligence Laboratory in the Department of Computer Science at The University of Texas at Austin, as well as Department Chair and Founding Director of Texas Robotics. In 2013 he was awarded the University of Texas System Regents' Outstanding Teaching Award and in 2014 he was inducted into the UT Austin Academy of Distinguished Teachers, earning him the title of University Distinguished Teaching Professor. Professor Stone's research interests in Artificial Intelligence include machine learning (especially reinforcement learning), multiagent systems, and robotics.Professor Stone received his Ph.D. in Computer Science in 1998 from Carnegie Mellon University. From 1999 to 2002 he was a Senior Technical Staff Member in the Artificial Intelligence Principles Research Department at AT&T Labs - Research. He is an Alfred P. Sloan Research Fellow, Guggenheim Fellow, AAAI Fellow, IEEE Fellow, AAAS Fellow, ACM Fellow, Fulbright Scholar, and 2004 ONR Young Investigator. In 2007 he received the prestigious IJCAI Computers and Thought Award, given biannually to the top AI researcher under the age of 35, and in 2016 he was awarded the ACM/SIGAI Autonomous Agents Research Award, and in 2024 he was awarded the ACM/AAAI Allen Newell Award.

Publications

ProtoCRL: Prototype-based Network for Continual Reinforcement Learning

RLC, 2025 | Michela Proietti*, Peter R. Wurman, Peter Stone, Roberto Capobianco

The purpose of continual reinforcement learning is to train an agent on a sequence of tasks such that it learns the ones that appear later in the sequence while retaining theability to perform the tasks that appeared earlier. Experience replay is a popular method used to mak...

Automated Reward Design for Gran Turismo

NEURIPS, 2025 | Michel Ma, Takuma Seno, Kaushik Subramanian, Peter R. Wurman, Peter Stone, Craig Sherstan

When designing reinforcement learning (RL) agents, a designer communicates the desired agent behavior through the definition of reward functions - numerical feedback given to the agent as reward or punishment for its actions. However, mapping desired behaviors to reward func...

Proto Successor Measure: Representing the Space of All Possible Solutions of Reinforcement Learning

ICML, 2025 | Siddhant Agarwal*, Harshit Sikchi, Peter Stone, Amy Zhang*

Having explored an environment, intelligent agents should be able to transfer their knowledge to most downstream tasks within that environment. Referred to as ``zero-shot learning," this ability remains elusive for general-purpose reinforcement learning algorithms. While rec...

Hyperspherical Normalization for Scalable Deep Reinforcement Learning

ICML, 2025 | Hojoon Lee, Youngdo Lee, Takuma Seno, Donghu Kim, Peter Stone, Jaegul Choo

Scaling up the model size and computation has brought consistent performance improvements in supervised learning. However, this lesson often fails to apply to reinforcement learning (RL) because training the model on non-stationary data easily leads to overfitting and unstab...

A Champion-level Vision-based Reinforcement Learning Agent for Competitive Racing in Gran Turismo 7

RA-L, 2025 | Hojoon Lee, Takuma Seno, Jun Jet Tai, Kaushik Subramanian, Kenta Kawamoto, Peter Stone, Peter R. Wurman

Deep reinforcement learning has achieved superhuman racing performance in high-fidelity simulators like Gran Turismo 7 (GT7). It typically utilizes global features that require instrumentation external to a car, such as precise localization of agents and opponents, limiting ...

Argus: A Compact and Versatile Foundation Model for Vision

CVPR, 2025 | Weiming Zhuang, Chen Chen, Zhizhong Li, Sina Sajadmanesh, Jingtao Li, Jiabo Huang, Vikash Sehwag, Vivek Sharma, Hirotaka Shinozaki, Felan Carlo Garcia, Yihao Zhan, Naohiro Adachi, Ryoji Eki, Michael Spranger, Peter Stone, Lingjuan Lyu

While existing vision and multi-modal foundation models can handle multiple computer vision tasks, they often suffer from significant limitations, including huge demand for data and computational resources during training and inconsistent performance across vision tasks at d...

SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning

ICLR, 2025 | Hojoon Lee, Dongyoon Hwang, Donghu Kim, Hyunseung Kim, Jun Jet Tai, Kaushik Subramanian, Peter R. Wurman, Jaegul Choo, Peter Stone, Takuma Seno

Recent advances in CV and NLP have been largely driven by scaling up the number of network parameters, despite traditional theories suggesting that larger networks are prone to overfitting. These large networks avoid overfitting by integrating components that induce a simpli...

Dobby: A Conversational Service Robot Driven by GPT-4

RO-MAN, 2025 | Carson Stark, Bohkyung Chun, Casey Charleston, Varsha Ravi, Luis Pabon, Surya Sunkari, Tarun Mohan, Peter Stone, Justin Hart*

This work introduces a robotics platform which embeds a conversational AI agent in an embodied system for natural language understanding and intelligent decision-making for service tasks; integrating task planning and human-like conversation. The agent is derived from a larg...

Disentangled Unsupervised Skill Discovery for Efficient Hierarchical Reinforcement Learning

NEURIPS, 2025 | Jiaheng Hu*, Roberto Martin-Martin*, Peter Stone, Zizhao Wang*

A hallmark of intelligent agents is the ability to learn reusable skills purely from unsupervised interaction with the environment. However, existing unsupervised skill discovery methods often learn entangled skills where one skill variable simultaneously influences many ent...

SkiLD: Unsupervised Skill Discovery Guided by Factor Interactions

NEURIPS, 2025 | Zizhao Wang*, Jiaheng Hu*, Roberto Martin-Martin*, Amy Zhang*, Scott Niekum*, Peter Stone, Caleb Chuck, Stephen Chen

Unsupervised skill discovery carries the promise that an intelligent agent can learn reusable skills through autonomous, reward-free environment interaction. Existing unsupervised skill discovery methods learn skills by encouraging distinguishable behaviors that cover divers...

N-Agent Ad Hoc Teamwork

NEURIPS, 2025 | Caroline Wang*, Arrasy Rahman*, Ishan Durugkar, Elad Liebman*, Peter Stone

Current approaches to learning cooperative multi-agent behaviors assume relatively restrictive settings. In standard fully cooperative multi-agent reinforcement learning, the learning algorithm controls all agents in the scenario, while in ad hoc teamwork, the learning algor...

Learning to Look: Seeking Information for Decision Making via Policy Factorization

CORL, 2025 | Jiaheng Hu*, Peter Stone, Roberto Martin-Martin*, Ben Abbatematteo, Shivin Dass

Many robot manipulation tasks require active or interactive exploration behavior in order to be performed successfully. Such tasks are ubiquitous in embodied domains, where agents must actively search for the information necessary for each stage of a task, e.g., moving the h...

LaRS: Latent Reasoning Skills for Chain-of-Thought Reasoning

EMNLP, 2025 | Zifan Xu*, Peter Stone, Dmitriy Bespalov, Haozhu Wang, Xian Wu, Yanjun Qi

Chain-of-thought (CoT) prompting is a popularin-context learning (ICL) approach for large language models (LLMs), especially when tackling complex reasoning tasks. Traditional ICL approaches construct prompts using examples that contain questions similar to the input questio...

N-agent Ad Hoc Teamwork

NEURIPS, 2024 | Caroline Wang*, Arrasy Rahman*, Ishan Durugkar, Elad Liebman*, Peter Stone

Current approaches to learning cooperative multi-agent behaviors assume relatively restrictive settings. In standard fully cooperative multi-agent reinforcement learning, the learning algorithm controls all agents in the scenario, while in ad hoc teamwork, the learning algor...

Discovering Creative Behaviors through DUPLEX: Diverse Universal Features for Policy Exploration

NEURIPS, 2024 | Borja G. Leon*, Francesco Riccio, Kaushik Subramanian, Pete Wurman, Peter Stone

The ability to approach the same problem from different angles is a cornerstone of human intelligence that leads to robust solutions and effective adaptation to problem variations. In contrast, current RL methodologies tend to lead to policies that settle on a single solutio...

A Super-human Vision-based Reinforcement Learning Agent for Autonomous Racing in Gran Turismo

RLC, 2024 | Miguel Vasco*, Takuma Seno, Kenta Kawamoto, Kaushik Subramanian, Pete Wurman, Peter Stone

Racing autonomous cars faster than the best human drivers has been a longstanding grand challenge for the fields of Artificial Intelligence and robotics. Recently, an end-to-end deep reinforcement learning agent met this challenge in a high-fidelity racing simulator, Gran Tu...

Wait That Feels Familiar: Learning to Extrapolate Human Preferences for Preference-Aligned Path Planning.

ICRA, 2024 | Haresh Karnan*, Elvin Yang*, Garrett Warnell*, Joydeep Biswas*, Peter Stone

Autonomous mobility tasks such as lastmile delivery require reasoning about operator indicated preferences over terrains on which the robot should navigate to ensure both robot safety and mission success. However, coping with out of distribution data from novel terrains or a...

Now, Later, and Lasting: 10 Priorities for AI Research, Policy, and Practice.

COACM, 2024 | Eric Horvitz*, Vincent Conitzer*, Sheila McIlraith*, Peter Stone

Advances in artificial intelligence (AI) will transform many aspects of our lives and society, bringing immense opportunities but also posing significant risks and challenges. The next several decades may well be a turning point for humanity, comparable to the industrial rev...

Learning Optimal Advantage from Preferences and Mistaking it for Reward.

AAAI, 2024 | W. Bradley Knox*, Sigurdur Orn Adalgeirsson*, Serena Booth*, Anca Dragan*, Peter Stone, Scott Niekum*

We consider algorithms for learning reward functions from human preferences over pairs of trajectory segments, as used in reinforcement learning from human feedback (RLHF). Most recent work assumes that human preferences are generated based only upon the reward accrued withi...

Minimum Coverage Sets for Training Robust Ad Hoc Teamwork Agents.

AAAI, 2024 | Arrasy Rahman*, Jiaxun Cui*, Peter Stone

Robustly cooperating with unseen agents and human partners presents significant challenges due to the diverse cooperative conventions these partners may adopt. Existing Ad Hoc Teamwork (AHT) methods address this challenge by training an agent with a population of diverse tea...

Rethinking Social Robot Navigation: Leveraging the Best of Two Worlds

ICRA, 2024 | Amir Hossain Raj*, Zichao Hu*, Haresh Karnan*, Rohan Chandra*, Amirreza Payandeh*, Luisa Mao*, Peter Stone, Joydeep Biswas*, Xuesu Xiao*

Empowering robots to navigate in a socially compliant manner is essential for the acceptance of robots moving in human-inhabited environments. Previously, roboticists have developed geometric navigation systems with decades of empirical validation to achieve safety and effic...

The Human in the Loop: Perspectives and Challenges for RoboCup 2050.

AR, 2024 | Alessandra Rossi*, Maike Paetzel-Prüsmann*, Merel Keijsers*, Michael Anderson*, Susan Leigh Anderson*, Daniel Barry*, Jan Gutsche*, Justin Hart*, Luca Iocchi*, Ainse Kokkelmans*, Wouter Kuijpers*, Yun Liu*, Daniel Polani*, Caleb Roscon*, Marcus Scheunemann*, Peter Stone, Florian Vahl*, René van de Molengraft*, Oskar von Stryk*

Robotics researchers have been focusing on developing autonomous and human-like intelligent robots that are able to plan, navigate, manipulate objects, and interact with humans in both static and dynamic environments. These capabilities, however, are usually developed for di...

Real-time Trajectory Generation via Dynamic Movement Primitives for Autonomous Racing

ACC, 2024 | Catherine Weaver*, Roberto Capobianco, Peter R. Wurman, Peter Stone, Masayoshi Tomizuka*

We employ sequences of high-order motion primitives for efficient online trajectory planning, enabling competitive racecar control even when the car deviates from an offline demonstration. Dynamic Movement Primitives (DMPs) utilize a target-driven non-linear differential equ...

Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning

AAAI, 2024 | Zizhao Wang*, Caroline Wang*, Xuesu Xiao*, Yuke Zhu*, Peter Stone

Two desiderata of reinforcement learning (RL) algorithms are the ability to learn from relatively little experience and the ability to learn policies that generalize to a range of problem specifications. In factored state spaces, one approach towards achieving both goals is ...

Minimum Coverage Sets for Training Robust Ad Hoc Teamwork Agents

AAAI, 2024 | Arrasy Rahman*, Jiaxun Cui*, Peter Stone

Robustly cooperating with unseen agents and human partners presents significant challenges due to the diverse cooperative conventions these partners may adopt. Existing Ad Hoc Teamwork (AHT) methods address this challenge by training an agent with a population of diverse tea...

Learning Optimal Advantage from Preferences and Mistaking it for Reward

AAAI, 2024 | W. Bradley Knox*, Stephane Hatgis-Kessell*, Sigurdur Orn Adalgeirsson*, Serena Booth*, Anca Dragan*, Peter Stone, Scott Niekum*

We consider algorithms for learning reward functions from human preferences over pairs of trajectory segments---as used in reinforcement learning from human feedback (RLHF)---including those used to fine tune ChatGPT and other contemporary language models. Most recent work o...

Asynchronous Task Plan Refinement for Multi-Robot Task and Motion Planning

AAAI, 2024 | Yoonchang Sung*, Rahul Shome*, Peter Stone

This paper explores general multi-robot task and motion planning, where multiple robots in close proximity manipulate objects while satisfying constraints and a given goal. In particular, we formulate the plan refinement problem--which, given a task plan, finds valid assignm...

VaryNote: A Method to Automatically Vary the Number of Notes in Symbolic Music

CMMR, 2023 | Juan M. Huerta*, Bo Liu*, Peter Stone

Automatically varying the number of notes in symbolic music has various applications in assisting music creators to embellish simple tunes or to reduce complex music to its core idea. In this paper, we formulate the problem of varying the number of notes while preserving the...

LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning

NEURIPS, 2023 | Bo Liu*, Yifeng Zhu*, Chongkai Gao*, Yihao Feng*, Qiang Liu*, Yuke Zhu*, Peter Stone

Lifelong learning offers a promising paradigm of building a generalist agent that learns and adapts over its lifespan. Unlike traditional lifelong learning problems in image and text domains, which primarily involve the transfer of declarative knowledge of entities and conce...

FAMO: Fast Adaptive Multitask Optimization

NEURIPS, 2023 | Bo Liu*, Yihao Feng*, Peter Stone, Qiang Liu*

One of the grand enduring goals of AI is to create generalist agents that can learn multiple different tasks from diverse data via multitask learning (MTL). However, gradient descent (GD) on the average loss across all tasks may yield poor multitask performance due to severe...

Elden: Exploration via Local Dependencies

NEURIPS, 2023 | Zizhao Wang*, Jiaheng Hu*, Roberto Martin-Martin*, Peter Stone

Tasks with large state space and sparse reward present a longstanding challenge to reinforcement learning. In these tasks, an agent needs to explore the state space efficiently until it finds reward: the hard exploration problem. To deal with this problem, the community has ...

f-Policy Gradients: A General Framework for Goal-Conditioned RL using f-Divergences

NEURIPS, 2023 | Siddhant Agarwal*, Ishan Durugkar, Amy Zhang*, Peter Stone

Goal-Conditioned RL problems provide sparse rewards where the agent receives a reward signal only when it has achieved the goal, making exploration a difficult problem. Several works augment this sparse reward with a learned dense reward function, but this can lead to subopt...

A Novel Control Law for Multi-joint Human-Robot Interaction Tasks While Maintaining Postural Coordination

IROS, 2023 | Keya Ghonasgi*, Reuth Mirsky*, Adrian M Haith*, Peter Stone, Ashish D Deshpande*

Exoskeleton robots are capable of safe torque-controlled interactions with a wearer while moving their limbs through pre-defined trajectories. However, affecting and assisting the wearer's movements while incorporating their inputs (effort and movements) effectively during a...

Symbolic State Space Optimization for Long Horizon Mobile Manipulation Planning

IROS, 2023 | Xiaohan Zhang*, Yifeng Zhu*, Yan Ding*, Yuqian Jiang*, Yuke Zhu*, Peter Stone, Shiqi Zhang*

In existing task and motion planning (TAMP) research, it is a common assumption that experts manually specify the state space for task-level planning. A well-developed state space enables the desirable distribution of limited computational resources between task planning and...

Event Tables for Efficient Experience Replay

COLLAS, 2023 | Varun Kompella, Thomas Walsh, Samuel Barrett, Peter R. Wurman, Peter Stone

Experience replay (ER) is a crucial component of many deep reinforcement learning (RL) systems. However, uniform sampling from an ER buffer can lead to slow convergence and unstable asymptotic behaviors. This paper introduces Stratified Sampling from Event Tables (SSET), whi...

Composing Efficient, Robust Tests for Policy Selection

UAI, 2023 | Dustin Morrill, Thomas Walsh, Daniel Hernandez, Peter R. Wurman, Peter Stone

Modern reinforcement learning systems produce many high-quality policies throughout the learning process. However, to choose which policy to actually deploy in the real world, they must be tested under an intractable number of environmental conditions. We introduce RPOSST, a...

Model-Based Meta Automatic Curriculum Learning.

COLLAS, 2023 | Zifan Xu*, Yulin Zhang*, Shahaf S. Shperberg*, Reuth Mirsky*, Yuqian Jiang*, Bo Liu*, Peter Stone

Curriculum learning (CL) has been widely explored to facilitate the learning of hard-exploration tasks in reinforcement learning (RL) by training a sequence of easier tasks, often called a curriculum. While most curricula are built either manually or automatically based on h...

Improving Artificial Intelligence with Games

SCIENCE, 2023 | Peter R. Wurman, Peter Stone, Michael Spranger

Games continue to drive progress in the development of artificial intelligence.

Causal Policy Gradient for Whole-Body Mobile Manipulation

RSS, 2023 | Jiaheng Hu*, Peter Stone, Roberto Martin-Martin*

Developing the next generation of household robot helpers requires combining locomotion and interaction capabilities, which is generally referred to as mobile manipulation (MoMa). MoMa tasks are difficult due to the large action space of the robot and the common multi-objec...

"What's That Robot Doing Here?": Factors Influencing Perceptions Of Incidental Encounters With Autonomous Quadruped Robots.

TAS, 2023 | Elliott Hauser*, Yao-Cheng Chan*, Geethika Hemkumar*, Daksh Dua*, Parth Chonkar*, Efren Mendoza Enriquez*, Tiffany Kao*, Shikhar Gupta*, Huihai Wang*, Justin Hart*, Reuth Mirsky*, Joydeep Biswas*, Junfeng Jiao*, Peter Stone

Autonomous service robots in a public setting will generate hundreds of incidental human-robot encounters, yet researchers have only recently addressed this important topic in earnest. In this study, we hypothesized that visual indicators of human control, such as a leash on...

Task Phasing: Automated Curriculum Learning from Demonstrations

ICAPS, 2023 | Vaibhav Bajaj*, Guni Sharon*, Peter Stone

Applying reinforcement learning (RL) to sparse reward domains is notoriously challenging due to insufficient guiding signals. Common RL techniques for addressing such domains include (1) learning from demonstrations and (2) curriculum learning. While these two approaches ha...

Kinematic coordinations capture learning during human-exoskeleton interaction

SCIENTIFIC REPORTS, 2023 | Keya Ghonasgi*, Reuth Mirsky*, Nisha Bhargava*, Adrian M Haith*, Peter Stone, Ashish D Deshpande*

Human–exoskeleton interactions have the potential to bring about changes in human behavior for physical rehabilitation or skill augmentation. Despite signifcant advances in the design and control of these robots, their application to human training remains limited. The key o...

D-Shape: Demonstration-Shaped Reinforcement Learning via Goal Conditioning.

AAMAS, 2023 | Caroline Wang*, Garrett Warnell*, Peter Stone

While combining imitation learning (IL) and reinforcement learning (RL) is a promising way to address poor sample efficiency in autonomous behavior acquisition, methods that do so typically assume that the requisite behavior demonstrations are provided by an expert that be...

MACTA: A Multi-agent Reinforcement Learning Approach for Cache Timing Attacks and Detection

ICLR, 2023 | Jiaxun Cui*, Xiaomeng Yang*, Mulong Luo*, Geunbae Lee*, Peter Stone, Hsien-Hsin S. Lee*, Benjamin Lee*, G. Edward Suh*, Wenjie Xiong*, Yuandong Tian*

Security vulnerabilities in computer systems raise serious concerns as computers process an unprecedented amount of private and sensitive data today. Cachetiming attacks (CTA) pose an important practical threat as they can effectively breach many protection mechanisms in t...

Learning Perceptual Hallucination for Multi-Robot Navigation in Narrow Hallways

ICRA, 2023 | Jin-Soo Park*, Xuesu Xiao*, Garrett Warnell*, Harel Yedidsion*, Peter Stone

While current systems for autonomous robot navigation can produce safe and efficient motion plans in static environments, they usually generate suboptimal behaviors when multiple robots must navigate together in confined spaces. For example, when two robots meet each other i...

Benchmarking Reinforcement Learning Techniques for Autonomous Navigation

ICRA, 2023 | Zifan Xu*, Bo Liu*, Xuesu Xiao*, Anirudh Nair*, Peter Stone

Deep reinforcement learning (RL) has broughtmany successes for autonomous robot navigation. However,there still exists important limitations that prevent real-worlduse of RL-based navigation systems. For example, most learningapproaches lack safety guarantees; and learned na...

A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems

NEURAL NETWORKS, 2023 | Megan M. Baker*, Alexander New*, Mario Aguilar-Simon*, Ziad Al-Halah*, Sébastien M. R. Arnold*, Ese Ben-Iwhiwhu*, Andrew P. Brna*, Ethan Brooks*, Ryan C. Brown*, Zachary Daniels*, Anurag Daram*, Fabien Delattre*, Ryan Dellana*, Eric Eaton*, Haotian Fu*, Kristen Grauman*, Jesse Hostetler*, Shariq Iqbal*, Cassandra Kent*, Nicholas Ketz*, Soheil Kolouri*, George Konidaris*, Dhireesha Kudithipudi*, Seungwon Lee*, Michael L. Littman*, Sandeep Madireddy*, Jorge A. Mendez*, Eric Q. Nguyen*, Christine D. Piatko*, Praveen K. Pilly*, Aswin Raghavan*, Abrar Rahman*, Santhosh Kumar Ramakrishnan*, Neale Ratzlaff*, Andrea Soltoggio*, Peter Stone, Indranil Sur*, Zhipeng Tang*, Saket Tiwari*, Kyle Vedder*, Felix Wang*, Zifan Xu*, Angel Yanguas-Gil*, Harel Yedidsion*, Shangqun Yu*, Gautam K. Vallabha*

Despite the advancement of machine learning techniques in recent years, state-of-the-art systems lack robustness to “real world” events, where the input distributions and tasks encountered by the deployed systems will not be limited to the original training context, and syst...

Reward (Mis)design for autonomous driving

ARTIFICIAL INTELLIGENCE, 2023 | W. Bradley Knox*, Alessandro Allievi*, Holger Banzhaf*, Felix Schmitt*, Peter Stone

This article considers the problem of diagnosing certain common errors in reward design. Its insights are also applicable to the design of cost functions and performance metrics more generally. To diagnose common errors, we develop 8 simple sanity checks for identifying flaw...

The Perils of Trial-and-Error Reward Design: Misdesign through Overfitting and Invalid Task Specifications

AAAI, 2023 | Serena Booth*, W. Bradley Knox*, Julie Shah*, Scott Niekum*, Peter Stone, Alessandro Allievi*

In reinforcement learning (RL), a reward function that aligns exactly with a task's true performance metric is often sparse. For example, a true task metric might encode a reward of 1 upon success and 0 otherwise. These sparse task metrics can be hard to learn from, so in pr...

Metric Residual Networks for Sample Efficient Goal-Conditioned Reinforcement Learning

AAAI, 2023 | Bo Liu*, Yihao Feng*, Qiang Liu*, Peter Stone

Goal-conditioned reinforcement learning (GCRL) has a wide range of potential real-world applications, including manipulation and navigation problems in robotics. Especially in such robotics tasks, sample efficiency is of the utmost importance for GCRL since, by default, the ...

Multimodal Embodied Attribute Learning by Robots for Object-Centric Action Policies.

AUTONOMOUS ROBOTS, 2023 | Xiaohan Zhang*, Saeid Amiri*, Jivko Sinapov*, Jesse Thomason*, Peter Stone, Shiqi Zhang*

Robots frequently need to perceive object attributes, such as red, heavy, and empty, using multimodal exploratory behaviors, such as look, lift, and shake. One possible way for robots to do so is to learn a classifier for each perceivable attribute given an exploratory behav...

DM2: Distributed Multi-Agent Reinforcement Learning via Distribution Matching

AAAI, 2023 | Caroline Wang*, Ishan Durugkar, Elad Liebman*, Peter Stone

Current approaches to multi-agent cooperation rely heavily on centralized mechanisms or explicit communication protocols to ensure convergence. This paper studies the problem of distributed multi-agent learning without resorting to centralized components or explicit communic...

BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach

NEURIPS, 2022 | Bo Liu*, Mao Ye*, Stephen Wright*, Peter Stone, Qiang Liu*

Bilevel optimization (BO) is useful for solving a variety of important machine learning problems including but not limited to hyperparameter optimization, meta-learning, continual learning, and reinforcement learning. Conventional BO methods need to differentiate through the...

Value Function Decomposition for Iterative Design of Reinforcement Learning Agents

NEURIPS, 2022 | James MacGlashan, Evan Archer, Alisa Devlic, Takuma Seno, Craig Sherstan, Peter R. Wurman, Peter Stone

Designing reinforcement learning (RL) agents is typically a difficult process that requires numerous design iterations. Learning can fail for a multitude of reasons and standard RL methods provide too few tools to provide insight into the exact cause. In this paper, we show ...

Quantifying Changes in Kinematic Behavior of a Human-Exoskeleton Interactive System

IROS, 2022 | Keya Ghonasgi*, Reuth Mirsky*, Adrian M Haith*, Peter Stone, Ashish D Deshpande*

While human-robot interaction studies are becoming more common, quantification of the effects of repeated interaction with an exoskeleton remains unexplored. We draw upon existing literature in human skill assessment and present extrinsic and intrinsic performance metrics t...

Dynamic Sparse Training for Deep Reinforcement Learning

IJCAI, 2022 | Ghada Sokar, Elena Mocanu, Decebal Constantin Mocanu, Mykola Pechenizkiy, Peter Stone

Deep reinforcement learning (DRL) agents are trained through trial-and-error interactions with the environment. This leads to a long training time for dense neural networks to achieve good performance. Hence, prohibitive computation and memory resources are consumed. Recentl...

Outracing Champion Gran Turismo Drivers with Deep Reinforcement Learning

NATURE, 2022 | Pete Wurman, Samuel Barrett, Kenta Kawamoto, James MacGlashan, Kaushik Subramanian, Thomas Walsh, Roberto Capobianco, Alisa Devlic, Franziska Eckert, Florian Fuchs, Leilani Gilpin, Piyush Khandelwal, Varun Kompella, Hao Chih Lin, Patrick MacAlpine, Declan Oller, Takuma Seno, Craig Sherstan, Michael D. Thomure, Houmehr Aghabozorgi, Leon Barrett, Rory Douglas, Dion Whitehead Amago, Peter Dürr, Peter Stone, Michael Spranger, Hiroaki Kitano

Many potential applications of artificial intelligence involve making real-time decisions in physical systems while interacting with humans. Automobile racing represents an extreme example of these conditions; drivers must execute complex tactical manoeuvres to pass or block...

Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog

IJCAI, JAIR, 2021 | Jesse Thomason*, Aishwarya Padmakumar*, Jivko Sinapov*, Nick Walker*, Yuqian Jiang*, Harel Yedidsion*, Justin Hart*, Peter Stone, Raymond J. Mooney*

In this work, we present methods for using human-robot dialog to improve language understanding for a mobile robot agent. The agent parses natural language to underlying semantic meanings and uses robotic sensors to create multi-modal models of perceptual concepts like red a...

Agent-Based Markov Modeling for Improved COVID-19 Mitigation Policies

JAIR, 2021 | Roberto Capobianco, Varun Kompella, James Ault*, Guni Sharon*, Stacy Jong*, Spencer Fox*, Lauren Meyers*, Pete Wurman, Peter Stone

The year 2020 saw the covid-19 virus lead to one of the worst global pandemics in history. As a result, governments around the world have been faced with the challenge of protecting public health while keeping the economy running to the greatest extent possible. Epidemiologi...

Efficient Real-Time Inference in Temporal Convolution Networks

ICRA, 2021 | Piyush Khandelwal, James MacGlashan, Pete Wurman, Peter Stone

It has been recently demonstrated that Temporal Convolution Networks (TCNs) provide state-of-the-art results in many problem domains where the input data is a time-series. TCNs typically incorporate information from a long history of inputs (the receptive field) into a singl...

Multiagent Epidemiologic Inference through Realtime Contact Tracing

AAMAS, 2021 | Guni Sharon*, James Ault*, Peter Stone, Varun Kompella, Roberto Capobianco

This paper addresses an epidemiologic inference problem where, given realtime observation of test results, presence of symptoms,and physical contacts, the most likely infected individuals need to be inferred. The inference problem is modeled as a hidden Markovmodel where inf...

Expected Value of Communication for Planning in Ad Hoc Teamwork

AAAI, 2021 | William Macke*, Reuth Mirsky*, Peter Stone

A desirable goal for autonomous agents is to be able to coordinate on the fly with previously unknown teammates. Known as "ad hoc teamwork", enabling such a capability has been receiving increasing attention in the research community. One of the central challenges in ad hoc ...

Temporal-Logic-Based Reward Shaping for Continuing Reinforcement Learning Tasks

AAAI, 2021 | Yuqian Jiang*, Sudarshanan Bharadwaj*, Bo Wu*, Rishi Shah*, Ufuk Topcu*, Peter Stone

In continuing tasks, average-reward reinforcement learning may be a more appropriate problem formulation than the more common discounted reward formulation. As usual, learning an optimal policy in this setting typically requires a large amount of training experiences. Reward...

Goal Blending for Responsive Shared Autonomy in a Navigating Vehicle

AAAI, 2021 | Yu-Sian Jiang*, Garrett Warnell*, Peter Stone

Human-robot shared autonomy techniques for vehicle navigation hold promise for reducing a human driver's workload, ensuring safety, and improving navigation efficiency. However, because typical techniques achieve these improvements by effectively removing human control at cr...

A Penny for Your Thoughts: The Value of Communication in Ad Hoc Teamwork

IJCAI, 2021 | Reuth Mirsky*, William Macke*, Andy Wang*, Harel Yedidsion*, Peter Stone

In ad hoc teamwork, multiple agents need to collaborate without having knowledge about their teammates or their plans a priori. A common assumption in this research area is that the agents cannot communicate. However, just as two random people may speak the same language, au...

Balancing Individual Preferences and Shared Objectives in Multiagent Reinforcement Learning

IJCAI, 2021 | Ishan Durugkar, Elad Liebman*, Peter Stone

In multiagent reinforcement learning scenarios, it is often the case that independent agents must jointly learn to perform a cooperative task. This paper focuses on such a scenario in which agents have individual preferences regarding how to accomplish the shared task. We co...

Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks

NEURIPS, 2020 | Lemeng Wu*, Bo Liu*, Peter Stone, Qiang Liu*

We propose firefly neural architecture descent, a general framework for progressively and dynamically growing neural networks to jointly optimize the networks' parameters and architectures. Our method works in a steepest descent fashion, which iteratively finds the best netw...

An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch

NEURIPS, 2020 | Siddharth Desai*, Ishan Durugkar, Haresh Karnan*, Garrett Warnell*, Josiah Hanna*, Peter Stone

We examine the problem of transferring a policy learned in a source environment to a target environment with different dynamics, particularly in the case where it is critical to reduce the amount of interaction with the target environment during learning. This problem is par...

Reinforcement Learning for Optimization of COVID-19 Mitigation Policies

AAAI AI FOR SOCIAL GOOD, 2020 | Varun Kompella, Roberto Capobianco, Stacy Jong*, Jonathan Browne*, Spencer Fox*, Lauren Meyers*, Pete Wurman, Peter Stone

The year 2020 has seen the COVID-19 virus lead to one of the worst global pandemics in history. As a result, governments around the world are faced with the challenge of protecting public health, while keeping the economy running to the greatest extent possible. Epidemiologi...

Blog Posts

Sony AI at the Reinforcement Learning Conference 2024

August 10, 2024 | Peter Stone, Game AI, Takuma Seno, Kenta Kawamoto, Kaushik Subramanian, Miguel Vasco*, Peter R. Wurman

Sony AI will be participating in the Reinforcement Learning (RL) Conference in Amherst, Massachusetts, from August 9 to 12, 2024 where we will be ...

Event Tables for Efficient Experience Replay

December 14, 2023 | Peter Stone, Pete Wurman, Game AI, GT Sophy, Thomas Walsh, Samuel Barrett, Varun Kompella

Each of us carries a core set of experiences, events that stand out as particularly important and have shaped our lives more than an average day. ...

Sony AI Reveals New Research Contributions at NeurIPS 2023

December 13, 2023 | Peter Stone, Alice Xiang, Jerone Andrews, Events, Kazuki Shimada, Apostolos Modas, Tarek Besold, William Thong, Dora Zhao*, Lingjuan Lyu, Orestis Papakyriakopoulos*, Xin Dong, Nidham Gazagnadou, Weiming Zhuang, Vivek Sharma, Yuki Mitsufuji, Chen Chen

Sony Group Corporation and Sony AI have been active participants in the annual NeurIPS Conference for years, contributing pivotal research that has ...

RPOSST: Testing an AI Agent for Deployment in the Real World

December 5, 2023 | Peter Stone, Pete Wurman, Game AI, Thomas Walsh, Dustin Morrill

Bleary-eyed engineers know the anxiety that comes with a deployment, and the importance of testing every aspect of a product before it goes to the ...

RoboCup and Its Role in the History and Future of AI

June 17, 2021 | Peter Stone, Robotics

As I write this blog post, we're a few days away from the opening of the 2021 RoboCup Competitions and Symposium. Running from June 22nd-28th, this ...

The Challenge to Create a Pandemic Simulator

March 3, 2021 | Peter Stone, Life at Sony AI, Varun Kompella, Roberto Capobianco

The thing I like most about working at Sony AI is the quality of the projects we're working on, both for their scientific challenges and for their ...