Skip to content

Advancing AI: Highlights from May

| June 1, 2026

May put sound at the center of Sony AI’s research. From 11 papers at ICASSP that push audio systems to be more honest about what they understand, to Woosh, a foundation model built for the people who design sound for games and film, the month showed careful research and practical tooling moving together. Stay tuned, June highlights next month will bring new deep-dives, news updates, and conference coverage.


Stories from the Sony AI Blog

Sony AI at ICASSP 2026:
Research Roundup

Sony AI presented 11 accepted papers at ICASSP 2026 in Barcelona, spanning music understanding, generative audio, audio-visual alignment, and data quality. The work shares a disposition: that audio systems should be honest about what they can and cannot do, which calls for better benchmarks, training methods that reflect how music is actually made, and evaluation that tracks human perception. Featured work includes WEALY, a fully reproducible pipeline that matches lyrics directly from audio without transcription; a multi-track approach to automatic music sample identification that improves on prior state of the art by 15%; MEGAMI, the first generative framework for automatic music mixing; FlashFoley, an open-source sketch-to-audio model that generates 11 seconds of stereo audio in 75 milliseconds; and FoleyBench, a benchmark built specifically for Foley-style video-to-audio evaluation. Many of the projects release open code, models, and benchmarks.

Read the full roundup here.


Introducing Woosh:
Sony AI’s Sound Effect Foundation Model

Most generative audio research has focused on music or general audio; sound effects, the building blocks that game and film sound designers rely on, were largely overlooked. Woosh is Sony AI’s answer: a foundation model built from the ground up for sound effect generation, designed around the workflows and quality standards professionals actually use. The team trained two versions in parallel, a private model optimized on licensed studio-grade libraries from Pro Sound Effects and BOOM, and a public counterpart trained on openly available datasets and released with open weights and inference code. Woosh handles both text-to-audio and video-to-audio generation, and on the FoleyBench benchmark its video-to-audio model outperforms the comparable baseline on audio quality and semantic alignment while using fewer parameters. The team is also building a plugin for digital audio workstations, with planned controls for variation, inpainting, and personalization, so the model fits the pipelines sound designers already work in.

Read the full piece here.
Explore the model, weights, and demos at sonyresearch.github.io/Woosh.


Sony AI In the News

May’s headline release drew attention well beyond the lab. Woosh landed with the professional audio community and the open research world alike, with coverage centering on a deliberate design choice: that professional sound design needs fundamentally different data and controls than general-purpose audio AI.

 

Mixonline: The pro-audio trade publication covered the Woosh release, noting that Sony AI trained two versions in parallel, a private model optimized on licensed libraries such as Pro Sound Effects and BOOM, and a public model released for the research community. The piece highlighted Sony AI’s finding that the private model significantly outperforms public alternatives on professional sound effect data.

Read the article here.

Beyond the trade press, the public release circulated through the open-source and research community, with open weights and inference code on GitHub and demo samples on the project page.

 

Sony AI in TIME: Are We Entering the Age of Data Nihilism?

Alice Xiang, Lead Research Scientist for AI Ethics at Sony AI, penned an op-ed for TIME Magazine.

The op-ed argues that society is moving toward an era of “data nihilism,” which Xiang describes as, “our data means everything to AI developers yet almost nothing to us—not because our data actually is value-less, but because people feel powerless to stop it from being collected against their will.” Xiang argues that unchecked data risks reinforcing bias, misinformation, and systemic inequality, and suggests there is an opportunity to improve through stronger transparency, accountability, and ethical data practices.

Read the full article here.


Connect with us on LinkedIn, Instagram, or X, and let us know what you’d like to see in future editions. Until next month, keep imagining the possibilities with Sony AI.