Gran Turismo Sophy™, Five Years On: From Nature Cover to Open Frontier

Written by Admin | Jul 1, 2026 2:43:23 AM

In early 2020, a handful of researchers had an ambitious idea: Build an AI agent that could beat the best drivers in the world at Gran Turismo™, the PlayStation® racing simulation game. Five years later, that agent is a permanent feature in a game played by millions. It is a case study in how research becomes a product and a platform generating new questions (and some answers) about the future of reinforcement learning.

This is the story of Gran Turismo Sophy™ (GT Sophy).

Before Sony AI: The Origins

The role of AI in Gran Turismo began before Sony AI existed. Around 2016, Kenta Kawamoto, then a researcher in Sony's R&D center, started exploring Gran Turismo as a testbed for reinforcement learning. Much of the early work focused on building an API between the game and an external agent. Two projects with interns and researchers at the University of Zurich furthered the exploration and produced the first published results: one achieving superhuman time trial performance on a single track, and a second demonstrating rudimentary passing skills. These early experiments ran on a handful of PlayStations connected to local networks.

When Sony AI was formed in April 2020, the team set far larger ambitions. As Peter Wurman, Sony AI Deputy President, described it, "we set our sights on beating expert human drivers in head-to-head racing with our initial team of a dozen world-class engineers and researchers."

The first task was infrastructure. The team built a custom distributed training system called Dart (Distributed Asynchronous Rollouts and Training) that connected to Sony Interactive Entertainment's (SIE) PlayStation in the Cloud resources. Over time, this unlocked access to more than 1,000 PlayStation 4 consoles, allowing the team to explore multiple research ideas in parallel. When Dart came online in January 2021, the research started in earnest. By then, the team had doubled in size.

The collaboration was three-way from the start. Spranger described it as "a unique collaboration between Sony AI, Polyphony Digital (PDI), and Sony Interactive Entertainment (SIE)" that was "truly unique to Sony." Polyphony Digital provided access to the game's physics engine and racing expertise. SIE provided the cloud PlayStation infrastructure. Sony AI provided reinforcement learning research.

Wurman reflected on the learning curve: "When it came to developing Gran Turismo Sophy, we had to learn all we could about not only gaming but racing simulation; and, ultimately, racing in general. This enabled us to account for changes in physics, racing dynamics, and more within our models and training."

For Wurman, the lesson was one he had encountered before, from warehouse robotics to racing AI: "One lesson I've learned in my career applying AI to real-world problems is the importance of having a deep understanding of the problem domain." In the case of racing, the team needed to internalize that "there is a lot more to winning a race than being the fastest (though that certainly helps!)."

"In just a year and a half, we went from an AI agent that struggled to stay on the track to one that could outrace the best human players in the world," Wurman noted.

Learning to Race, Learning to Be Fair

What made GT Sophy distinct from earlier AI milestones was not just speed.

It was sportsmanship.

The agent learned through a novel deep reinforcement learning algorithm called Quantile Regression Soft Actor-Critic (QR-SAC). It trained not from human demonstrations but from experience, optimizing a reward function that balanced track progress, collision avoidance, steering smoothness, and racing etiquette. A technique the team called "multi-table experience replay" partitioned the training buffer into different tables for each skill or scenario, keeping data proportions consistent throughout learning.

But sportsmanship proved the hardest problem.

Just a week before the first Race Together exhibition in July 2021, internal tests revealed that GT Sophy was crashing into opponents on purpose, and blocking aggressively. The feedback from Polyphony Digital's in-house testers was blunt: the agents would be disqualified for poor conduct if they raced that way in the actual event.

The team redesigned reward parameters, trained new agents, and ran rigorous tests in a single week. The tests revealed "just how thin the line is between an agent that wins at all costs and one that races competitively without violating racing etiquette," the team later reflected.

Sportsmanship in racing, (unlike the rules of chess or Go), is imprecisely defined and enforced by experienced human stewards who watch replays and assign penalties based on context.

Alisa Devlic, Senior Research Scientist, described the scale of the challenge. "It had to be competitive and at the same time be a good sport, respecting the racing etiquette rules. The complexity of this challenge made it a formidable task for our team, as we did not have precisely defined rules available to us or a dataset to learn from. Decisions on penalties, following the car contacts in the race, are usually very contextualized and subjectively decided by human judges after watching the race replays."

As Wurman observed, "It was not easy to find a configuration in which GT Sophy was confident enough to hold its driving line and also be respectful of its opponents."

He described the result with a phrase that stuck: "GT Sophy is the first real AI that we could call a good sport."

***Race Together*: The Test**

Two Race Together exhibition events in 2021 put GT Sophy against four of the world's best Gran Turismo drivers. In the July event, the humans won the overall points championship, 86 to 70. GT Sophy had some great moments, including first-place finishes in two of three races, but the final race at Circuit de la Sarthe exposed critical weaknesses in multi-car tactics at high speed.

The team had 111 days to address them. In the October rematch, with the same tracks, cars, and human competitors, an improved GT Sophy took the top two positions in every race and won the team competition 104 to 52.

Michael Spranger, President of Sony AI, called it "Sony AI's very first AI breakthrough" as the project was featured on the cover of Nature February of 2022.

On Feb. 10, 2022, Gran Turismo Sophy graced the cover of Nature

From Four Cars to… Three Hundred, and Beyond

What happened next is rare in AI research. Within a year of the Nature cover, GT Sophy moved from a laboratory result to a commercial product.

By February of 2023, Sony AI and Polyphony Digital released GT Sophy in Gran Turismo 7 for the PS5™ console. The Race Together mode gave players the chance to race against GT Sophy across four circuits.

Kazunori Yamauchi, President of Polyphony Digital (PDI), placed the moment in the franchise's history, calling it "a symbolic moment across the 25-year history of Gran Turismo" and comparing its significance to the evolution of the series' automotive physics simulation.

Spranger described the aspiration in broader terms: GT Sophy was never only about being superhuman. The goal, he said, was to "enhance the experience of players of all levels, and to make this experience available to everyone."

But four car-and-track combinations was only the beginning. Scaling to the full game required the team to solve a new class of problems.

Patrick MacAlpine, Sony AI Research Scientist, turned to SHAP analysis (SHapley Additive exPlanations) to identify which car features mattered. The results were counterintuitive: "one of the most important features is the drivetrain system of a car, while other features such as the dry weight of the car do not accurately capture the dynamical behavior of a car in a race," MacAlpine explained.

Florian Fuchs, AI Engineer, found an architectural solution. Rather than feeding all car data through a single channel, GT Sophy performed better when static car information was handled through a separate encoding channel, allowing the agent to first learn a shared representation of dynamic race conditions before mixing in the car-specific features.

The scale was considerable. "In addition to being able to drive over 300 cars, GT Sophy can drive each of them with any of the nine dry tire compounds available in the game," noted Takuma Seno, AI Engineer. Car coverage has continued to grow since.

Transitioning from research to permanent product required new testing infrastructure. MacAlpine described how automated tests running across more than 1,000 PlayStations in parallel helped Polyphony Digital meet tight deployment deadlines.

In November 2023, GT Sophy 2.0 launched globally as a permanent feature in GT7 Spec II, covering 340+ cars across nine tracks.

"GT Sophy is one of the first deep reinforcement learning agents to become a permanent component of a console game," Seno said. "This is a huge accomplishment in terms of the history of both AI and video game development."

Wurman framed the accomplishment in the context of what the team had set out to do: "We have evolved GT Sophy from a research project tackling the grand challenge of creating an AI agent that could outpace the world's best drivers to a commercial product that delivers a fun and exciting racing experience for all players."

Spranger reflected on the speed of the transition. Moving from Nature publication to deployment in under a year, he said, exemplified Sony AI's highest aspiration: bringing cutting-edge AI technology to millions of people, not just publishing it. In June 2024, GT Sophy received the AI Breakthrough Award for "Best Overall Use of AI in Gaming,” helping to solidify the moment in AI history.

GT Sophy 2.1: More Control, More Players

In March 2025, GT Sophy 2.1 arrived in GT7's Custom Race mode, giving players greater flexibility and supporting well over 500 cars across an expanding set of tracks and race formats.. This flexibility means players could now choose GT Sophy's car models and tracks from 19 options, customize car parts and tuning specifications, and set tire and fuel consumption rates.

The update reflected a shift in philosophy. Where earlier releases focused on showcasing GT Sophy's skill, 2.1 emphasized player agency.

"We are giving players more control than ever over their interactions with GT Sophy by allowing them to fine-tune gameplay, experiment with new strategies, and advance their racing skills," said Kaushik Subramanian, Senior Staff Research Scientist.

Wurman described the broader ambition behind the update: "Our goal was to provide players with an achievable, fun racing experience, whether they were a beginner or an advanced racer."

The team worked alongside Polyphony Digital to find the right balance between the behaviors GT Sophy exhibited, "ensuring that it could inspire players to elevate their technique; while also demonstrating sportsmanship, finesse, and racing temperament that would make races enjoyable and challenging for everyone."

Sony AI also trained GT Sophy to exhibit greater levels of sportsmanship, finesse, and temperament, balancing its racing behavior to create a more dynamic experience for players of all levels.

GT Sophy's Game Updates

GT Sophy’s evolution as a product did not stop at major version releases. Starting in mid-2024, Polyphony Digital and Sony AI expanded GT Sophy’s footprint through a series of game updates that progressively added new circuit support and grew car compatibility. Not every update included Sophy additions, but the pattern was consistent: updates typically introduced support for two new tracks alongside an expanding roster of compatible cars. Car coverage, which launched at 340 vehicles with Spec II, has since grown to well over 500. The Power Pack also expanded GT Sophy's track coverage to include dirt and snow surfaces for the first time.

Beginning in July 2024 with the addition of Nürburgring 24h and Autodrome Lago Maggiore (update 1.49), GT Sophy expanded to Brands Hatch GP and Dragon Trail Seaside (1.52, October 2024), Trial Mountain and Tokyo Expressway (1.54, November 2024), Interlagos and Mount Panorama (1.55, January 2025), Monza and Sardegna (1.56, February 2025), Barcelona-Catalunya (1.59, May 2025), Alsace Village (1.60, June 2025), and High Speed Ring (1.61, July 2025).

In December 2025, GT Sophy 3.0 arrived as part of the Power Pack paid DLC ($29.99) alongside the free Spec III update (Patch 1.65). The Power Pack introduced 50 new races across 20 themed categories, brought the return of 24-hour endurance racing, and featured full race-weekend formats spanning practice, qualifying, and the main event. GT Sophy 3.0 delivered what Polyphony Digital described as the most authentic AI racing behavior to date, available exclusively in the Power Pack on the PS5™ .

Wurman placed the ongoing collaboration in perspective. "Our team is always looking for ways to enhance GT Sophy so that we can help PDI bring new and exciting features to players."

A Lineage of Milestones: GT Sophy and Ace

GT Sophy's influence extends beyond Gran Turismo. When Sony AI's Project Ace robot, (the table tennis system accepted for publication in Nature in 2026), needed to learn striking skills from simulation and transfer them to the physical world, the team drew directly from GT Sophy's architecture.

Both systems share the same learning philosophy: develop agents that perceive, decide, and act autonomously under uncertainty. In published GT Sophy research, the team explored a privileged-critic reinforcement learning approach, in which a critic accesses additional state information during training to accelerate learning, even though the deployed policy operates without it. That research directly informed Ace's architecture, which uses the technique in practice.

Peter Dürr, Sony AI Director and Lead Engineer on Project Ace, had been skeptical of the approach. Then he saw the results from GT Sophy's privileged-critic research. "It totally blew my mind," Dürr recalled. "I didn't think this was possible at all, but with this kind of privileged information fed to the critic, it turns out the policy can learn how to do sensor fusion and anticipate the trajectory of a table-tennis ball."

Peter Stone, Chief Scientist at Sony AI, noted that achieving two such milestones in quick succession is "a rarity in the history of AI research." Historically, he observed, different institutions reached each major AI landmark. The same group accomplishing two within three years is highly unusual.

GT Sophy proved reinforcement learning could master expert-level control in a complex, high-speed virtual environment. Ace proved it could do so in the physical world, where sensing, latency, and hardware introduce entirely new challenges.

The Research Platform: Three Frontiers

While GT Sophy evolved as a product, it simultaneously became a foundation for new research. Three papers published in 2025 use GT Sophy to investigate open problems in reinforcement learning, problems that Stone articulated in an April 2026 talk at SXSW titled "Is Reinforcement Learning the Real Future of AI?"

Can Reward Design Be Automated?

GT Sophy's reward function was the product of years of iterative design, balancing eight atomic reward components through extensive experimentation. Wurman captured the rigor this demanded.

"Just because something seems to work doesn't mean that it is correct. Similarly, just because something isn't working doesn't mean it can't be made to work. Understanding why something is or isn't working is just as, or more, important than getting a good result."

At SXSW, Stone described the underlying tension: "There's sort of an art and a science to reinforcement learning, we sometimes say. So the science is once you have the reward function and the state and the action space, and you have the ability of the agent to generate experience, then the science is how can we make efficient algorithms to optimize the reward? The art is how do you define the state space, the action space, and the reward function?"

The paper "Automated Reward Design for Driving with Large Language Models" (Ma et al., NeurIPS Workshop 2025), on which Stone is a co-author, asks whether that art can be bypassed. The system uses large language models to generate reward functions from natural language prompts. Given a text description such as "drive fast while staying on the track and avoiding collisions," it produces reward code that trains agents to GT Sophy-competitive performance. The implication: the costliest step in bringing reinforcement learning to a new domain becomes automatable for structured settings where objectives are specifiable in language.

Can Champion Policies Be Adapted Without Retraining?

Stone has long advocated for combining AI methods rather than relying on any single approach. At SXSW, he described himself as "a believer of trying to take all of the different AI methods we have as tools in our toolbox and putting them together."

“Residual-MPPI: Online Policy Customization for Continuous Control" (Wang, Li, Weaver, Kawamoto,Tomizuka, Tang, Zhan 2024) puts that philosophy into practice. The Berkeley-led paper, with Sony Research Japan co-author Kenta Kawamoto, introduces Residual-MPPI, a method that layers model predictive control over GT Sophy's pre-trained deep RL policy. Users can modify GT Sophy's behavior (drive conservatively, stay on course, adapt to personal preferences) using as few as 100 laps of corrective data. The baseline approach, Residual-SAC, requires 80,000 laps for similar customization. GT Sophy becomes a tunable platform, adaptable in real time without retraining the underlying network.

Can a Champion Agent Race Without Privileged Information?

The most ambitious of the three papers addresses the longest-standing barrier between GT Sophy and the physical world. The original agent relied on global features available in simulation: precise track geometry, opponent positions, velocities, and accelerations. These are straightforward to access inside a simulator but impractical with real-world sensors.

"A Champion-level Vision-based Reinforcement Learning Agent for Competitive Racing in Gran Turismo 7" (Lee, Seno, Tai, Subramanian, Kawamoto, Stone, Wurman; RA-L 2025) eliminates that dependency. The agent operates from ego-centric camera views and onboard IMU data alone. It uses the same asymmetric actor-critic architecture that would later inform Ace: the actor relies on local features only, while the critic accesses global features during training. A recurrent neural network enables the actor to infer opponent positions and track layouts from partial observations.

Evaluated across three scenarios (Tokyo Expressway, Spa-Francorchamps, and Le Mans), the vision-based agent consistently secured first place against 19 opponents, even when starting from the back of a 20-car grid. On the Tokyo track, the vision-based agent outperformed all baselines, (including the original GT Sophy), because vision allowed it to perceive opponent orientation; something the original point-mass representation could not capture.

The paper's conclusion is direct: "To our knowledge, this work presents the first vision-based autonomous racing agent to demonstrate champion-level performance in competitive racing scenarios."

A vision-based policy that performs at champion level using only onboard sensors represents a meaningful step toward transferring learned racing behaviors to physical hardware.

Why the Hardest Problems Still Matter

In August 2025, Sony AI's Deep RL team reflected on why they remain committed to reinforcement learning in an era dominated by large language models.

"If you care about AI, you can't avoid reinforcement learning because the problem it describes is everywhere in AI," said James MacGlashan, Senior Staff Research Scientist.

Wurman, reflecting on the journey from warehouse robotics to racing AI to the broader frontier, has consistently emphasized practical rigor over theoretical ambition. At IJCAI in 2022, presenting the GT Sophy research to the international AI community, he framed the work in terms of its potential.

"Our work in this area demonstrates the power of AI to deliver new gaming and entertainment experiences for both creators and players. It is our hope that GT Sophy will help inspire players to reach new levels in their technique and creativity," he said.

Stone traced an unexpected lineage. Reinforcement learning from human feedback, or RLHF, was "one of the main differences between GPT-3 and GPT-3.5, which became known as 'ChatGPT,'" he noted. The earliest work in that area predates Sony AI. Brad Knox, one of Stone's former PhD students at the University of Texas at Austin, introduced the first RLHF system, TAMER, in 2008. James MacGlashan and colleagues followed with a variant called COACH.

"Our agents need to be able to generalize across situations; to adapt like humans do, not retrain from scratch every time the environment changes," said Harm van Seijen, Staff Research Scientist, naming the frontier that remains.

The Platform Continues

GT Sophy began as a research challenge, asking: Could reinforcement learning produce an agent capable of beating world champions in a realistic racing simulator? The answer came in February 2022.

What followed was not a conclusion but a series of transitions.

From breakthrough to limited release.

From four cars to more than five hundred. From time-limited event to permanent feature.

From single milestone to lineage, as techniques explored in GT Sophy research carried forward into Ace's physical AI system.

Wurman has described video games as capturing "a unique subset of human intelligence"; the only domain where social interaction, reasoning, strategic planning, and rule-following converge in ways that make them uniquely challenging for AI. GT Sophy is evidence that reinforcement learning can meet that challenge; and the research it continues to generate suggests the hardest problems are still ahead.

Devlic captured the team's forward-looking view.

"We hope to have GT Sophy offered on many more tracks as well as at different performance levels that are personalized to and can challenge each individual player. It would be great to have the human players and the AI agent continuously learn from each other in order to improve their performance in the future."

Now, in 2025 and beyond, GT Sophy serves as a research platform for the next set of open questions:

Automated reward design asks whether the costliest step in RL can be bypassed with language models.
Online policy customization asks whether champion-level agents can be adapted without retraining.
Vision-based racing asks whether policies trained in simulation can operate without the information advantages that only simulation provides.

The frontier remains open. That is precisely what makes GT Sophy, five years on, more than a milestone: It is a platform for the questions that come next.

To learn more about GT Sophy and Sony AI's reinforcement learning research, visit ai.sony.

View full post