Authors

Venue

Date

Share

SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning

Hojoon Lee

Dongyoon Hwang

Donghu Kim

Hyunseung Kim

Jun Jet Tai

Kaushik Subramanian

Peter R. Wurman

Jaegul Choo

Peter Stone

Takuma Seno

ICLR-25

2025

Abstract

Recent advances in CV and NLP have been largely driven by scaling up the number of network parameters, despite traditional theories suggesting that larger networks are prone to overfitting. These large networks avoid overfitting by integrating components that induce a simplicity bias, guiding models toward simple and generalizable solutions. However, in deep RL, designing and scaling up networks have been less explored. Motivated by this opportunity, we present SimBa, an architecture designed to scale up parameters in deep RL by injecting a simplicity bias. SimBa consists of three components: (i) an observation normalization layer that standardizes inputs with running statistics, (ii) a residual feedforward block to provide a linear pathway from the input to output, and (iii) a layer normalization to control feature magnitudes. By scaling up parameters with SimBa, the sample efficiency of various deep RL algorithms-including off-policy, on-policy, and unsupervised methods-is consistently improved. Moreover, solely by integrating SimBa architecture into SAC, it matches or surpasses state-of-the-art deep RL methods with high computational efficiency across DMC, MyoSuite, and HumanoidBench. These results demonstrate SimBa's broad applicability and effectiveness across diverse RL algorithms and environments.

Related Publications

Dobby: A Conversational Service Robot Driven by GPT-4

RO-MAN, 2025
Carson Stark, Bohkyung Chun, Casey Charleston, Varsha Ravi, Luis Pabon, Surya Sunkari, Tarun Mohan, Peter Stone, Justin Hart*

This work introduces a robotics platform which embeds a conversational AI agent in an embodied system for natural language understanding and intelligent decision-making for service tasks; integrating task planning and human-like conversation. The agent is derived from a larg…

Disentangled Unsupervised Skill Discovery for Efficient Hierarchical Reinforcement Learning

NeurIPS, 2025
Jiaheng Hu*, Roberto Martin-Martin*, Peter Stone, Zizhao Wang*

A hallmark of intelligent agents is the ability to learn reusable skills purely from unsupervised interaction with the environment. However, existing unsupervised skill discovery methods often learn entangled skills where one skill variable simultaneously influences many ent…

SkiLD: Unsupervised Skill Discovery Guided by Factor Interactions

NeurIPS, 2025
Zizhao Wang*, Jiaheng Hu*, Roberto Martin-Martin*, Amy Zhang*, Scott Niekum*, Peter Stone, Caleb Chuck, Stephen Chen

Unsupervised skill discovery carries the promise that an intelligent agent can learn reusable skills through autonomous, reward-free environment interaction. Existing unsupervised skill discovery methods learn skills by encouraging distinguishable behaviors that cover divers…

  • HOME
  • Publications
  • SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning

JOIN US

Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.