Visual Reward Machines

Abstract

Non-markovian Reinforcement Learning (RL) tasks are extremely hard to solve, because intelligent agents must consider the entire history of state-action pairs to act rationally in the environment. Most works use Linear Temporal Logic (LTL) to specify temporally-extended tasks. This approach applies only in finite and discrete state environments or continuous problems for which a mapping between the continuous state and a symbolic interpretation is known as a symbol grounding function. In this work, we define Visual Reward Machines (VRM), an automata-based neurosymbolic framework that can be used for both reasoning and learning in non-symbolic non-markovian RL domains. VRM is a fully neural but interpretable system, that is based on the probabilistic relaxation of Moore Machines. Results show that VRMs can exploit ungrounded symbolic temporal knowledge to outperform baseline methods based on RNNs in non-markovian RL tasks.

View PDF

Visual Reward Machines

Abstract

Authors

Venue

Date

Share

Related Publications

Join Us on the Cutting Edge of AI Innovation