Advancing AI: Highlights from April
Sony AI
May 1, 2025
April marked a standout moment for Sony AI as we brought bold ideas to the world stage at ICLR 2025. This year’s conference spotlighted our work across generative modeling, multimodal alignment, audio processing, and behavior learning — showing how AI can power richer, more intuitive human-machine interactions.
Whether it’s making sound and video feel more in sync, helping creators edit audio more naturally, or teaching AI to understand physical space and human behavior, these papers reflect our mission to develop AI that expands creative and collaborative possibilities.
In addition, check out our recent interviews with GTPlanet and PrintMag to learn more about the latest developments for GT Sophy and our recent SXSW fireside chat.
Latest Research
SoundCTM: Making Audio Generation Smarter and Faster
In audio synthesis, speed matters. SoundCTM is our solution: a model that combines diffusion-based generative modeling with contrastive token masking to generate sound in a more structured, controllable way.
Why it matters:
- -Generates diverse, high-quality audio—faster than traditional diffusion models.
-Understands audio structure, enabling smoother sound transitions. - -Opens new doors for creators in music, gaming, and film.
“SoundCTM allows creators to efficiently perform trial-and-error with 1-step generation and later refine their work through high-quality multi-step sampling, all while preserving the intended meaning of the sound.”
MMDisCo: Aligning Audio and Video Without Retraining
Multimodal models are typically massive and resource-heavy. MMDisCo (Multi-Modal Discriminator-Guided Cooperative Diffusion) offers a smarter alternative: it aligns pre-trained audio and video models using a shared discriminator — no full retraining required.
Why it matters:
- -Drastically cuts down computational cost for multimodal generation.
-Ensures synchronized outputs across audio and video, ideal for dubbing, animation, and accessibility.
-Shows how independent models can still "collaborate" effectively.
“Our method enables creators to generate well-aligned audio and video outputs without retraining or redesigning existing models, significantly reducing both cost and effort.”
SimBa: Teaching AI to Learn from Behavior, Not Just Rewards
SimBa (Simulation-Based Behavior Learning) trains agents to act more like humans by learning from behavior — not just goal-based rewards. The model uses simulation to infer motivations behind observed actions, creating more adaptive and lifelike agents.
Why it matters:
- -Helps AI agents learn from demonstration, not just trial-and-error.
-Great for robotics, gaming, and real-world planning tasks.
-Bridges the gap between human intuition and machine behavior.
“SimBa simplifies deep reinforcement learning by favoring models that naturally scale with complexity—giving creators and engineers a powerful, easy-to-integrate backbone for complex real-world applications.”
HERO: Hierarchical Exploration with Reinforcement Objectives
HERO blends curiosity and structure into AI exploration. It uses hierarchical reinforcement learning to guide agents in unfamiliar environments, encouraging discovery while staying goal-aligned.
Why it matters:
- -Improves sample efficiency in training reinforcement learning agents.
-Supports real-world applications like autonomous navigation and game AI.
-Mimics how humans explore: with purpose and adaptability.
“HERO empowers creators to guide generative models with nuanced personal feedback—shaping outputs in real time with fewer annotations and more creative control.”
Together, these papers demonstrate the wide spectrum of Sony AI’s research — from practical advances in sound and video generation to foundational work in agent intelligence.
Sony AI at Global Events
In April, our teams shared the spotlight at ICLR 2025, one of the world’s leading machine learning conferences — contributing to important conversations on model efficiency, personalization, and responsible AI development.
Beyond our core work on generative audio and multimodal modeling, we were proud to showcase these standout efforts:
Jump Your Steps – A new method for optimizing how discrete diffusion models generate content. It allows creators to keep quality high while cutting down on computational time — ideal for real-time applications like interactive media or game design.
Mining Your Own Secrets – This research introduces a way to personalize text-to-image diffusion models using classifier scores, letting users subtly steer outputs toward evolving brand identities or visual styles without retraining from scratch.
Residual-MPPI – An approach to control customization in real-time, making it possible for AI agents (like racing or sports bots) to adapt to a player’s style on the fly — no retraining required.
Weighted Point Set Embedding – An improvement to multimodal contrastive learning that better captures nuanced relationships across modalities (like text and image), boosting performance in search, retrieval, and accessibility use cases.
These papers reflect our deep commitment to building practical, performant AI systems that elevate the creative process and respect real-world constraints.
Explore our full ICLR 2025 Round-Up
Sony AI in the News
Researchers Talk GT Sophy 2.1
Sony AI researchers Kaushik Subramanian and Takuma Seno recently sat down with GTPlanet’s Jordan Greer and Chaz Draycott to discuss the newest version of GT Sophy, now available in Gran Turismo™ 7. The pair detailed GT Sophy 2.1’s capabilities, how the agent was trained, and more.
Michael Spranger Shares Perspective on AI and Creativity at SXSW
Sony AI’s President Michael Spranger took part in a SXSW fireside chat titled, “Can Human & AI Collaboration Extend the Bounds of Creativity.” Michael offered his insights on AI and the creative process during this discussion, which were detailed by PRINT.
Learn More About His Perspective
Peter Stone Receives ACM Allen Newell Award
We’re proud to share that Peter Stone, Chief Scientist at Sony AI and Professor at The University of Texas at Austin, has been honored with the Association for Computing Machinery, or ACM’s prestigious AAAI Allen Newell Award for his career-defining contributions to artificial intelligence. The award recognizes his pioneering work in reinforcement learning, multiagent systems, transfer learning, and intelligent robotics—core areas that continue to shape Sony AI’s research agenda.
“Peter Stone has fundamentally advanced how autonomous agents learn, plan, and collaborate,” said the ACM. “His innovations in reinforcement learning have enabled robots to acquire skills through experience, while his work on multiagent coordination has transformed how agents operate collectively toward shared goals.”
This recognition underscores the impact of Peter’s research not only on Sony AI’s development of collaborative agents and embodied systems, but also on the broader AI community.
Discover More About Peter Stone’s Work
Join the Conversation
Connect with us on LinkedIn, Instagram, or Twitter, and let us know what you’d like to see in future editions. Until next month, keep imagining the possibilities with Sony AI.
Latest Blog

April 23, 2025 | Events, Sony AI
Sony AI at ICLR 2025: Refining Diffusion Models, Reinforcement Learning, and AI …
This April, Sony AI will present a wide-ranging portfolio of research at the International Conference on Learning Representations (ICLR 2025). From progressing text-to-sound and vi…

April 1, 2025 | Sony AI
Advancing AI: Highlights from March
March was a month of notable advancements, global discussions, and strategic collaborations, reinforcing our commitment to responsible AI innovation. From pioneering multi-modal AI…

March 18, 2025 | Sony AI
Unlocking the Future of Video-to-Audio Synthesis: Inside the MMAudio Model
In a world increasingly driven by multimedia content, AI researchers have long struggled to generate high-quality, synchronized audio from video. Recently, audio generation models …