From Hypothesis to Reality: The GT Sophy Team Explains the Evolution of the Breakthrough AI Agent
January 11, 2024
Since its inception in 2020, Sony AI has been committed to enhancing human imagination and creativity through the acceleration of AI research and development. One of the first examples of this work can be found within the organization’s gaming flagship: Gran Turismo Sophy™ (GT Sophy).
What started out as a grand challenge to create an AI agent that could beat the world’s best Gran TurismoTM drivers, evolved into a quest to deliver a new and exciting gaming experience to players around the world through AI. This involved the highly technical evolution of GT Sophy from a research outcome to an actual game feature that could be introduced as part of the Gran Turismo™ 7 (GT7) PlayStation® racing simulation game. In less than two years since GT Sophy appeared on the cover of Nature, the breakthrough AI agent has now become a permanent in-game feature of GT7.
The success of this project can be directly attributed to the diligent efforts of a dedicated team of research scientists, engineers, and various other experts. The team – in conjunction with Polyphony Digital Inc. (PDI) and Sony Interactive Entertainment – has created numerous AI approaches and training and evaluation methods. This includes, most notably, a novel deep reinforcement learning approach called Quantile Regression Soft Actor-Critic (QR-SAC) and a never-before-built infrastructure platform to accommodate this specific project.
We recently asked some of the GT Sophy research and development team to share their insights on the evolution of the project – including the challenges they encountered along the way, what they are most proud of, and what the future holds for GT Sophy.
How would you describe the evolution of GT Sophy over the past two years?
Kaushik Subramanian, Senior Research Scientist: It has been an incredible journey for the team – from training GT Sophy to complete a single lap to competitive racing with top GT drivers. Over the past two years, the team has worked to unlock increasingly complex capabilities of the AI agent, including learning to drive most of the cars in the game to executing precise drifting maneuvers. The exciting part of the evolution has been that with every step we take in developing GT Sophy, we identify new frontiers and opportunities for us to improve the AI and the racing experience of millions of GT players.
Takuma Seno, AI Engineer: I have been working on the GT Sophy project for more than three years and it has been exciting to see its evolution. At the very beginning, the agent struggled to drive alone and the ability to have it race against others was something beyond our imagination. Because of our work leading up to the Nature publication and the limited-time release as well as all of the development since, what was once unimaginable is now real.
Florian Fuchs, AI Engineer: In 2020, I had the chance to participate in a meeting with Kazunori Yamauchi, Producer of the Gran Turismo series, as part of my internship with Sony AI in Tokyo. During that meeting, Yamauchi-san asked me how long I thought it would take to get our proof of concept racing AI into the game in some form. I was somewhat perplexed by that question since I didn’t even think that far ahead back then. At that point in time, my feeling was that our prototype was so far away from being useful for the game developers because of the long list of challenges we had to tackle. Many of those challenges, such as driving in a “versus” scenario with other cars or driving on a track without walls, were tackled as part of the work published in Nature. However, after the publication, one major challenge remained: we needed to make our agent stable enough to be able to race against and satisfy the enormous Gran Turismo player base. From the Race Together events to the time-limited release earlier this year, we had to take on the task of broadening our test field so that we could create an agent stable enough to race against all levels of players no matter which car they use – which was a different beast of a challenge.
How do you take GT Sophy from being able to drive four cars in February 2023 to over 300 cars in November 2023?
Patrick MacAlpine, Research Scientist: One of the difficulties of learning a policy that can drive hundreds of cars is that the full set of cars have a wide variety of dynamics and driving styles. We have found that a policy is able to learn to effectively drive different types of cars by being given car specific features that help determine driving characteristics such as weight, horsepower, and length. However, of the around 50 different car-specific features we have access to, only a subset of them is useful in deciding how the policy should control driving the car. A technique we use to determine which car features are most important, and should be given as input to the policy, is SHAP analysis. SHAP (SHapley Additive exPlanations) uses a game theoretic approach to compute the influence each input feature to a policy has on the policy's output. Through the use of SHAP analysis, we discovered that one of the most important features is the drivetrain system of a car, while other features such as the dry weight of the car do not accurately capture the dynamical behavior of a car in a race.
Florian: As we continued to refine the features used to describe the car GT Sophy was controlling, we had to make sure the overall training time would not scale with the number of cars the agent was learning about. In our initial experiments, GT Sophy performed significantly worse than AI agents that had learned to drive a single car. To overcome this, we investigated how to best inform GT Sophy about the dynamics of the car that it would be controlling. When searching over different type of neural network architectures, we found that GT Sophy makes much better use of static information (like the features that describe a car model) if they are handled through a separate encoding channel, such that the agent can first learn a separate representation of non-static information (position on the track, position of opponent cars, and others) that is shared among all car models before mixing it with the stream of static car specific information. Through this approach, we were able to get the performance of the multi-car GT Sophy model much closer to the one-car model, and create interesting racing situations for GT players.
Takuma: In addition to being able to drive over 300 cars, GT Sophy can drive each of them with any of the nine dry tire compounds available in the game. Each tire has different friction limits, which changes the speed of cars by a wide margin. To give the best flexibility to the game design team, we trained GT Sophy to generalize over all nine tires. As a result, GT Sophy is capable of switching tires to fit in various situations in the game.
What were the steps involved in training GT Sophy to be accessible to the average GT racer?
Alisa Devlic, Senior Research Scientist: There were three main challenges to solve in this project: 1) to train an agent to control the car during the race and understand complex car dynamics that can enable it to operate on the edge of control; 2) train the agent to master complex maneuvers such as slipstream passing, crossover passes and blocking; and 3) to conform to the racing etiquette against different kinds of racers. The third challenge was especially difficult to teach GT Sophy – it had to be competitive and at the same time be a good sport, respecting the racing etiquette rules. The complexity of this challenge made it a formidable task for our team, as we did not have precisely defined rules available to us or a dataset to learn from. Decisions on penalties, following the car contacts in the race, are usually very contextualized and subjectively decided by human judges after watching the race replays and seeing how the collision affected the cars involved.
We had to train GT Sophy to be a good and well-mannered driver on multiple cars and tracks, even in crowded conditions and tight and slow corners. When you have more cars to learn about, there is a wider range of possible interactions of GT Sophy with opponents. And the more we tried to precisely define the situation of the collision, the agent became more aggressive.
The performance improved when we made changes in the number of opponents and initial positions during the training and adjusted the sportsmanship rewards to handle situations with many cars in close proximity. We were able to then balance GT Sophy’s performance from super-human behavior to levels that are more adjusted to average human players.
Florian: One of our main efforts in developing a version of GT Sophy stable enough to race against all player levels – no matter their car – was creating elaborate tests to evaluate GT Sophy in a variety of situations that can appear in races against humans. Collecting meaningful metrics with those tests, like the number of collisions in races with various car types, was, however, only the first half of the job. Additionally, we had to create new tools to compare the performance of our agents over this very high dimensional space of metrics. For this, we created an elaborate process that first filters policies based on a minimal set of requirements, such as not exceeding a certain number of collisions per test race or not exceeding a certain lap time threshold. We then picked all policies lying on the pareto border of those combined metrics and ran them through a more thorough evaluation process, including races with every possible car covered in the game. I believe that, in this regard, our project is unique compared to the limited evaluation techniques that are featured in reinforcement learning literature.
What were some of the challenges and/or surprises around the AI development for this project?
Kaushik: There have been several challenges and surprising elements the team has faced in this development. One surprising outcome was related to balancing GT Sophy's driving ability to make it accessible to more racers. In some experiments, we encouraged GT Sophy to explore different ways in which the tire slip of a car can be controlled. In an extreme version of that control, we observed that GT Sophy had learned to drive slower. The AI agent accomplished that goal by learning to control the front and rear tire slip to generate drifting and 360 degree looping maneuvers while making progress along the track. At the start of the experiment, this was not an outcome we had explicitly considered, but GT Sophy surprised us with what it was able to learn. We were able to further analyze and intentionally train this behavior, as demonstrated in a recent GT exhibition event in Amsterdam.
Patrick: One challenge we encountered when racing against human drivers is that they often perform actions that are unexpected, which results in situations that were never experienced by GT Sophy during training. When these unexpected situations occur, this can result in the AI effectively panicking and veering off the road because it has no idea what to do in that situation. I sometimes like to think of this as if a UFO has suddenly appeared in front of the AI’s car.
To account for these situations, we recorded the state of the game when something unexpected or bad happened. We then, later, tried to recreate those situations in additional tasks as part of our training regime so that the AI could learn how to better act in those situations. An alternative approach was to make our state representation more generalizable, such that something that has never been seen before by the AI will still look similar to something that has likely been experienced in training. As an example, during testing we found that GT Sophy would sometimes spin out and crash if it saw four or more cars in front of it (training situations with that many cars directly in front of GT Sophy rarely occurred). With a deadline looming, to make these situations with many cars look more like those seen in training, we limited the number of cars it would need to reason about.
What additional processes, methods or steps were needed to make this permanently available in GT7?
Takuma: To be a permanent component of the game, we built our productization pipeline to deploy GT Sophy into the game – which required additional engineering efforts. With this pipeline, we can easily deliver various types of AI models to production. I believe that this productization pipeline will also help accelerate the future R&D cycle of GT Sophy. Additionally, we also needed to make our deep learning models executable in the PS5 console. We leveraged Sony’s software stacks, such as Neural Network Libraries, and added additional optimization to execute AI inference as fast as possible. This was a fantastic experience for me to embed our AI models into the actual gaming console and see the agent showing behavior in real-time. There aren’t many places that can provide the same opportunity in the world.
Kaushik: An important part of the process was the collaboration with Polyphony Digital (PDI). We relied on their expertise and feedback to make sure the trained policy resulted in a well-rounded agent. The interactions with PDI helped us identify issues with missing state features, training scenarios, and reward functions. In particular, it was helpful to explain why the policy was struggling to drive certain cars in the game, like the Tomahawk and the Kart. Once the training was complete, we would need to evaluate the policies in situations that it is likely to experience in the world. Here PDI’s in-house experts would playtest the policy under different conditions and give us feedback to further improve the performance.
Patrick: Transitioning GT Sophy from a research project to a permanent part of GT7 is in many ways more challenging than the original goal of the project of competing against and beating the best players in the world. Before making it into the game, GT Sophy’s ability to drive every car on every track needed to be carefully evaluated, and that had the potential to take hundreds of hours for PDI to manually do this evaluation themselves. Given the limited time to evaluate the policies after they had been trained before finalizing what would make it into the game, we quickly automated the tests that would otherwise be performed manually. With the help of automated tests evaluating GT Sophy’s performance while running on over 1,000 PlayStations in parallel, we were able to aid PDI in meeting the tight deadline for getting GT Sophy into GT7 Spec II Update. The gauntlet of tests included ones in which GT Sophy started in last position and tried to pass as many of the built-in-AI cars as possible, and 20-car-1-make races in which 20 identical cars raced together all controlled by GT Sophy. Even with our best policies, some cars had to be excluded because GT Sophy can’t reliably drive them, yet.
What are you most proud of in regards to you and the overall team’s work on this project?
Takuma: In the February Race Together release, GT Sophy was the first deep reinforcement learning agent deployed to an actual console game in PlayStation history. Now, in this release, GT Sophy is one of the first deep reinforcement learning agents to become a permanent component of a console game. This is a huge accomplishment in terms of the history of both AI and video game development. I'm very proud of being a part of this historical achievement.
Alisa: Yes, I am proud that we managed to achieve these challenging goals: publishing in the Nature journal and a commercial deployment of GT Sophy worldwide in just three years!
Kaushik: Taking this from a research project to a permanent commercial release has a unique set of challenges. Being on that journey with this team and reaching this milestone together with support from PDI has been a very rewarding experience. There were moments along the project where the AI agent's learned behavior was moving away from our target requirements, and it felt like we were not making progress in the intended direction. While these were rare moments, the team was always ready to step up and figure out innovative approaches to bring GT Sophy back on track.
What does the future hold for GT Sophy?
Alisa: Now that GT Sophy is deployed permanently in the game, it is meant to be used for both entertaining people and improving their racing skills. We hope to have GT Sophy offered on many more tracks as well as at different performance levels that are personalized to and can challenge each individual player. It would be great to have the human players and the AI agent continuously learn from each other in order to improve their performance in the future.
Kaushik: While our current version of GT Sophy is a competitive AI racing agent, we understand there are skills it still needs to learn about. For example, exploring more cars and tracks in the game as well as driving proficiently with dynamic weather conditions or in long endurance races. We imagine a future where we will continue to be surprised by the novel ways in which GT Sophy improves its race craft.
To learn more about GT Sophy as a permanent in-game feature as part of GT7, read on: https://ai.sony/articles/sonyai022/.
See GT Sophy in action as part of GT7 here :https://youtu.be/QjWZWNLahBk?si=EY7wlYLvGFaAQt0Y
January 18, 2024 | Sony AI
Navigating Responsible Data Curation Takes the Spotlight at NeurIPS 2023
The field of Human-Centric Computer Vision (HCCV) is rapidly progressing, and some researchers are raising a red flag on the current ethics of data curation. A primary concern is t…
December 14, 2023
Event Tables for Efficient Experience Replay
Each of us carries a core set of experiences, events that stand out as particularly important and have shaped our lives more than an average day. However, this is often not the cas…
December 13, 2023
Sony AI Reveals New Research Contributions at NeurIPS 2023
Sony Group Corporation and Sony AI have been active participants in the annual NeurIPS Conference for years, contributing pivotal research that has helped to propel the fields of a…