Authors
- Hojoon Lee
- Youngdo Lee
- Takuma Seno
- Donghu Kim
- Peter Stone
- Jaegul Choo
Venue
- ICML-25
Date
- 2025
Hyperspherical Normalization for Scalable Deep Reinforcement Learning
Hojoon Lee
Youngdo Lee
Takuma Seno
Donghu Kim
Jaegul Choo
ICML-25
2025
Abstract
Scaling up the model size and computation has brought consistent performance improvements in supervised learning. However, this lesson often fails to apply to reinforcement learning (RL) because training the model on non-stationary data easily leads to overfitting and unstable optimization. In response, we introduce SimbaV2, a novel RL architecture designed to stabilize optimization by (i) constraining the growth of weight and feature norm by hyperspherical normalization; and (ii) using a distributional value estimation with reward scaling to maintain stable gradients under varying reward magnitudes. Using the soft actor-critic as a base algorithm, SimbaV2 scales up effectively with larger models and greater compute, achieving state-of-the-art performance on 57 continuous control tasks across 4 domains.
Related Publications
Having explored an environment, intelligent agents should be able to transfer their knowledge to most downstream tasks within that environment. Referred to as ``zero-shot learning," this ability remains elusive for general-purpose reinforcement learning algorithms. While rec…
Deep reinforcement learning has achieved superhuman racing performance in high-fidelity simulators like Gran Turismo 7 (GT7). It typically utilizes global features that require instrumentation external to a car, such as precise localization of agents and opponents, limiting …
While existing vision and multi-modal foundation models can handle multiple computer vision tasks, they often suffer from significant limitations, including huge demand for data and computational resources during training and inconsistent performance across vision tasks at d…
JOIN US
Shape the Future of AI with Sony AI
We want to hear from those of you who have a strong desire
to shape the future of AI.