Authors
- Ce Hao*
- Catherine Weaver*
- Chen Tang*
- Kenta Kawamoto
- Masayoshi Tomizuka*
- Wei Zhan*
* External authors
Venue
- RAL 2024
Date
- 2024
Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning
Ce Hao*
Catherine Weaver*
Chen Tang*
Masayoshi Tomizuka*
Wei Zhan*
* External authors
RAL 2024
2024
Abstract
Hierarchical reinforcement learning (RL) can accelerate long-horizon decision-making by temporally abstracting a policy into multiple levels. Promising results in sparse reward environments have been seen with skills , i.e. sequences of primitive actions. Typically, a skill latent space and policy are discovered from offline data. However, the resulting low-level policy can be unreliable due to low-coverage demonstrations or distribution shifts. As a solution, we propose the Skill-Critic algorithm to fine-tune the low-level policy in conjunction with high-level skill selection. Our Skill-Critic algorithm optimizes both the low-level and high-level policies; these policies are initialized and regularized by the latent space learned from offline demonstrations to guide the parallel policy optimization. We validate Skill-Critic in multiple sparse-reward RL environments, including a new sparse-reward autonomous racing task in Gran Turismo Sport. The experiments show that Skill-Critic's low-level policy fine-tuning and demonstration-guided regularization are essential for good performance.
Related Publications
Racing autonomous cars faster than the best human drivers has been a longstanding grand challenge for the fields of Artificial Intelligence and robotics. Recently, an end-to-end deep reinforcement learning agent met this challenge in a high-fidelity racing simulator, Gran Tu…
Autonomous racing poses a significant challenge for control, requiring planning minimum-time trajectories under uncertain dynamics and controlling vehicles at their handling limits. Current methods requiring hand-designed physical models or reward functions specific to each …
We employ sequences of high-order motion primitives for efficient online trajectory planning, enabling competitive racecar control even when the car deviates from an offline demonstration. Dynamic Movement Primitives (DMPs) utilize a target-driven non-linear differential equ…
JOIN US
Shape the Future of AI with Sony AI
We want to hear from those of you who have a strong desire
to shape the future of AI.