* External authors




Model-based Reinforcement Learning with Scalable Composite Policy Gradient Estimators

Paavo Parmas*

Takuma Seno

Yuma Aoki*

* External authors

ICML 2023



In model-based reinforcement learning (MBRL), policy gradients can be estimated either by derivative-free RL methods, such as likelihood ratio gradients (LR), or by backpropagating through a differentiable model via reparameterization gradients (RP). Instead of using one or the other, the Total Propagation (TP) algorithm in prior work showed that a combination of LR and RP estimators averaged using inverse variance weighting (IVW) can achieve orders of magnitude improvement over either method. However, IVW-based composite estimators have not yet been applied in modern RL tasks, as it is unclear if they can be implemented scalably. We propose a scalable method, Total Propagation X (TPX) that improves over TP by changing the node used for IVW, and employing coordinate wise weighting. We demonstrate the scalability of TPX by applying it to the state of the art visual MBRL algorithm Dreamer. The experiments showed that Dreamer fails with long simulation horizons, while our TPX works reliably for only a fraction of additional computation. One key advantage of TPX is its ease of implementation, which will enable experimenting with IVW on many tasks beyond MBRL.

Related Publications

Proppo: a Message Passing Framework for Customizable and Composable Learning Algorithms

NeurIPS, 2022
Paavo Parmas*, Takuma Seno

While existing automatic differentiation (AD) frameworks allow flexibly composing model architectures, they do not provide the same flexibility for composing learning algorithms---everything has to be implemented in terms of back propagation. To address this gap, we invent A…

Value Function Decomposition for Iterative Design of Reinforcement Learning Agents

NeurIPS, 2022
James MacGlashan, Evan Archer, Alisa Devlic, Takuma Seno, Craig Sherstan, Peter R. Wurman, Peter Stone

Designing reinforcement learning (RL) agents is typically a difficult process that requires numerous design iterations. Learning can fail for a multitude of reasons and standard RL methods provide too few tools to provide insight into the exact cause. In this paper, we show …

d3rlpy: An Offline Deep Reinforcement Learning Library

Journal of Machine Learning Research, 2022
Takuma Seno, Michita Imai*

In this paper, we introduce d3rlpy, an open-sourced offline deep reinforcement learning (RL) library for Python. d3rlpy supports a set of offline deep RL algorithms as well as off-policy online algorithms via a fully documented plug-and-play API. To address a reproducibility…

  • HOME
  • Publications
  • Model-based Reinforcement Learning with Scalable Composite Policy Gradient Estimators


Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.