Residual-MPPI: Online Policy Customization for Continuous Control

VIEW PUBLICATION

Pengcheng Wang

Chenran Li

Catherine Weaver*

Kenta Kawamoto

Masayoshi Tomizuka*

Chen Tang*

Wei Zhan*

* External authors

ICLR-25

2025

Abstract

Policies learned through Reinforcement Learning (RL) and Imitation
Learning (IL) have demonstrated significant potential in achieving advanced performance in continuous control tasks. However, in real-world environments, it
is often necessary to further customize a trained policy when there are additional requirements that were unforeseen during the original training phase. It
is possible to fine-tune the policy to meet the new requirements, but this often requires collecting new data with the added requirements and access to the
original training metric and policy parameters. In contrast, an online planning
algorithm, if capable of meeting the additional requirements, can eliminate the
necessity for extensive training phases and customize the policy without knowledge of the original training scheme or task. In this work, we propose a generic
online planning algorithm for customizing continuous-control policies at the execution time which we call Residual-MPPI. It is able to customize a given prior
policy on new performance metrics in few-shot and even zero-shot online settings. Also, Residual-MPPI only requires access to the action distribution produced by the prior policy, without additional knowledge regarding the original
task. Through our experiments, we demonstrate that the proposed ResidualMPPI algorithm can accomplish the few-shot/zero-shot online policy customization task effectively, including customizing the champion-level racing agent, Gran
Turismo Sophy (GT Sophy) 1.0, in the challenging car racing scenario, Gran
Turismo Sport (GTS) environment. Demo videos are available on our website:
https://sites.google.com/view/residual-mppi

Related Publications

A Champion-level Vision-based Reinforcement Learning Agent for Competitive Racing in Gran Turismo 7

RA-L, 2025
Hojoon Lee, Takuma Seno, Jun Jet Tai, Kaushik Subramanian, Kenta Kawamoto, Peter Stone, Peter R. Wurman

Deep reinforcement learning has achieved superhuman racing performance in high-fidelity simulators like Gran Turismo 7 (GT7). It typically utilizes global features that require instrumentation external to a car, such as precise localization of agents and opponents, limiting …

A Super-human Vision-based Reinforcement Learning Agent for Autonomous Racing in Gran Turismo

RLC, 2024
Miguel Vasco*, Takuma Seno, Kenta Kawamoto, Kaushik Subramanian, Pete Wurman, Peter Stone

Racing autonomous cars faster than the best human drivers has been a longstanding grand challenge for the fields of Artificial Intelligence and robotics. Recently, an end-to-end deep reinforcement learning agent met this challenge in a high-fidelity racing simulator, Gran Tu…

BeTAIL: Behavior Transformer Adversarial Imitation Learning from Human Racing Gameplay

RAL, 2024
Catherine Weaver*, Chen Tang*, Ce Hao*, Kenta Kawamoto, Masayoshi Tomizuka*, Wei Zhan*

Autonomous racing poses a significant challenge for control, requiring planning minimum-time trajectories under uncertain dynamics and controlling vehicles at their handling limits. Current methods requiring hand-designed physical models or reward functions specific to each …

SEE ALL

HOME
Publications
Residual-MPPI: Online Policy Customization for Continuous Control

JOIN US

Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.

LEARN MORE