Authors

* External authors

Date

Share

A Super-human Vision-based Reinforcement Learning Agent for Autonomous Racing in Gran Turismo

Miguel Vasco*

Takuma Seno

Kenta Kawamoto

Kaushik Subramanian

Pete Wurman

Peter Stone

* External authors

2024

Reinforcement Learning Conference (RLC), 2024
*Equal Contribution, Work done during his internship at Tokyo Laboratory, Sony AI.

1
2
3

We contribute the first vision-based super-human car racing agent, able to outperform the best human drivers in time-trial races in Gran Turismo™ 7.


Abstract

Racing autonomous cars faster than the best human drivers has been a longstanding grand challenge for the fields of Artificial Intelligence and robotics. Recently, an end-to-end deep reinforcement learning agent met this challenge in a high-fidelity racing simulator, Gran Turismo. However, this agent relied on global features that require instrumentation external to the car. This paper introduces, to the best of our knowledge, the first super-human car racing agent whose sensor input is purely local to the car, namely pixels from an ego-centric camera view and quantities that can be sensed from on-board the car, such as the car's velocity. By leveraging global features only at training time, the learned agent is able to outperform the best human drivers in time trial (one car on the track at a time) races using only local input features. The resulting agent is evaluated in Gran Turismo 7 on multiple tracks and cars. Detailed ablation experiments demonstrate the agent's strong reliance on visual inputs, making it the first vision-based super-human car racing agent.


Designing a Super-Human Vision Racing Agent

Observations

We build multimodal observations of our racing agent with local and global features. For local features, we consider the image and propriocentric information (e.g., velocity and acceleration of the car). For global features we consider information related to the shape to the course.

Image
\(\mathbf{o}_t^I \in \mathbb{R}^{64 \times 64 \times 3}\)

Propriocentric
\(\mathbf{o}_t^g \in \mathbb{R}^{17}\)

Course Shape
\(\mathbf{o}_t^g \in \mathbb{R}^{531}\)

Actions

Our agent outputs a delta steering angle and a combined throttle and brake value. The gear shift of the vehicle is controlled by automatic transmission. We set the control frequency to 10 Hz and the game, running at 60 Hz, linearly interpolates the steering angle between steps.

Delta Steering Angle
\(\mathbf{a}_t^0 \in [-3^\circ, 3^\circ]\)

Throttle and Brake
\(\mathbf{a}_t^1 \in [-1, 1]\)

Reward Function

We designed a multi-component reward function for the agent that rewards track progression and penalizes collisions, off-track driving, and inconsistent driving.

Training Scheme

We exploit an asymmetric actor-critic architecture, in which the critic function uses different input modalities from those for the policy function, to train our agent: the policy network \(\pi_{\phi}\) is provided with proprioceptic information \(o^p\) and image features \(h^I\), encoded with a convolutional neural network \(q_{\theta}\), to output actions \(\mathbf{a}\). The critic network \(Q_{\psi}\) is provided with local proprioceptic observations and global observations \(o^g\) to predict quantiles of future returns. During execution, our agent only receives local features from the Gran Turismo™ 7 simulator.


Results

We evaluate our agent in time trial races, where the goal is to complete a lap across the track in the minimum time possible. We consider three scenarios in Gran Turismo™ 7 with different combinations of cars, tracks, and conditions (track time and weather): Monza, Tokyo, and Spa, modeled after real-world circuits and roads.

We compare our agent against over 130K human drivers in each scenario. In all three scenarios, our agent achieves super-human performance, with lap times that significantly surpass the performance of the best human player.

In the paper we additionally highlight: (i) the importance of the asymmetrical training scheme; (ii) novel driving behavior in comparison with the best human reference drivers, and (iii) the strong dependence on image features for the decision-making of our agent.


BibTeX


@InProceedings{RLC24-sophy,
    author="Miguel Vasco and Takuma Seno and Kenta Kawamoto and Kaushik Subramanian and PeterR.\ Wurman and Peter Stone",
    title="A Super-human Vision-based Reinforcement Learning Agent for Autonomous Racing in {G}ran {T}urismo",
    booktitle="Reinforcement Learning Conference (RLC)",
    month="August",
    year="2024",
    location="Amherst, MA, USA",
    abstract={Racing autonomous cars faster than the best human drivers
            has been a longstanding grand challenge for the fields of
            Artificial  Intelligence and robotics. Recently, an
            end-to-end deep reinforcement learning agent met this
            challenge in a high-fidelity racing simulator, Gran
            Turismo. However, this agent relied on global features
            that require instrumentation external to the car. This
            paper introduces, to the best of our knowledge, the first
            super-human car racing agent whose sensor input is purely
            local to the car, namely pixels from an ego-centric camera
            view and quantities that can be sensed from on-board the
            car, such as the car's velocity. By leveraging global
            features only at training time, the learned agent is able
            to outperform the best human drivers in time trial (one
            car on the track at a time) races using only local input
            features. The resulting agent is evaluated in Gran Turismo
            7 on multiple tracks and cars. Detailed ablation
            experiments demonstrate the agent's strong reliance on
            visual inputs, making it the first vision-based
            super-human car racing agent.}
},

Related Publications

SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning

ICLR, 2025
Hojoon Lee, Dongyoon Hwang, Donghu Kim, Hyunseung Kim, Jun Jet Tai, Kaushik Subramanian, Peter R. Wurman, Jaegul Choo, Peter Stone, Takuma Seno

Recent advances in CV and NLP have been largely driven by scaling up the number of network parameters, despite traditional theories suggesting that larger networks are prone to overfitting. These large networks avoid overfitting by integrating components that induce a simpli…

Residual-MPPI: Online Policy Customization for Continuous Control

ICLR, 2025
Pengcheng Wang, Chenran Li, Catherine Weaver*, Kenta Kawamoto, Masayoshi Tomizuka*, Chen Tang*, Wei Zhan*

Policies learned through Reinforcement Learning (RL) and ImitationLearning (IL) have demonstrated significant potential in achieving advanced performance in continuous control tasks. However, in real-world environments, itis often necessary to further customize a trained pol…

Dobby: A Conversational Service Robot Driven by GPT-4

RO-MAN, 2025
Carson Stark, Bohkyung Chun, Casey Charleston, Varsha Ravi, Luis Pabon, Surya Sunkari, Tarun Mohan, Peter Stone, Justin Hart*

This work introduces a robotics platform which embeds a conversational AI agent in an embodied system for natural language understanding and intelligent decision-making for service tasks; integrating task planning and human-like conversation. The agent is derived from a larg…

  • HOME
  • Publications
  • A Super-human Vision-based Reinforcement Learning Agent for Autonomous Racing in Gran Turismo

JOIN US

Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.