Category

Share

From Data Fairness to 3D Image Generation: Sony AI at NeurIPS 2024

Events

December 13, 2024

The 38th Annual Conference on Neural Information Processing Systems is fast approaching. NeurIPS 2024, the largest AI conference in the world, is taking place this year at the Vancouver Convention Center, Vancouver B.C. on Tuesday, December 10 through Sunday, December 15.

This year's conference promises groundbreaking developments across diverse fields, from generative AI to privacy-preserving machine learning. Sony AI is proud to showcase its research contributions to the conversation, spanning music and audio synthesis, skill discovery, federated learning, and beyond. Here’s a glimpse at the research we’re presenting, setting the stage for transformative ideas and innovation in AI.

Don’t miss presentations at our booth and more information about our work: NeurIPS 2024 Schedule

Accepted Research Round-Up

AI Ethics

Data Fairness

Oral presentation

Sony AI’s paper accepted at NeurIPS 2024, ” A Taxonomy of Challenges to Curating Fair Datasets,” highlights the pivotal steps toward achieving fairness in machine learning and is a continuation of our team’s deep explorations into these issues. This research also marks an evolution of the work from empirical inquiry into theory, taking a Human-Computer Interaction (HCI) approach to the issues. Read the full blog post to learn more.

3D Image Generation, The Complexities of Pitch and Timbre in Music Mixing, Generative AI Watermarking and Audio Compression

Research in the areas of generative AI, sound, and audio are moving rapidly, but there are still gaps in its ability to support professional human artistry. Our explorations aim to uncover novel approaches and tools to empower creators, offering them the ability to refine and shape their output in real-time with a better understanding of how AI can support the complexities of pitch and timbre in audio mixing, or reduce memory demands that have inhibited real-time text-to-sound generation.

Our work shows new pathways to deter unauthorized use of audio files and facilitate user tracking in generative models by integrating key-based authentication with watermarking techniques. See more details on our NeurIPS papers and workshops below.

LOCKEY: Securing Generative Models

Authors: Mayank Kumar Singh, Naoya Takahashi, Wei-Hsiang Liao, Yuki Mitsufuji

The Problem
Generative AI models are vulnerable to misuse and deepfake creation, with inadequate existing safeguards.

The Solution
LOCKEY uses key-based authentication, embedding user-specific watermarks for secure, traceable outputs.

The Challenge
Ensuring watermarks are robust, outputs stay high-quality, and scaling to numerous unique keys.

The Result
LOCKEY secures generative models while maintaining audio quality, setting a new benchmark for AI security.

Read the full research paper
Explore the Demo

Screenshot 2024-12-12 at 10.18.51 AM.png

DisMix: Mastering Music Mixes

Authors: Yin-Jyun Luo, Kin Wai Cheuk, Woosung Choi, Toshimitsu Uesaka, Keisuke Toyama, Koichi Saito, Chieh-Hsin Lai, Yuhta Takida, Wei-Hsiang Liao, Simon Dixon, Yuki Mitsufuji

The Problem
Pitch-timbre disentanglement for multi-instrument mixtures remained unaddressed, limiting creative possibilities.

The Solution
DisMix separates and recombines pitch and timbre using latent diffusion transformers and binarization layers.

The Challenge
Avoiding pitch-timbre overlap and reconstructing complex audio mixtures realistically.

The Result
DisMix delivers precise pitch-timbre manipulation, opening new avenues for creative audio applications.

Screenshot 2024-12-12 at 10.22.34 AM.png

Read the full research paper
View Samples

SoundCTM: Faster Text-to-Sound Generation

Authors: Koichi Saito, Zhi Zhong, Dongjun Kim, Yuhta Takida, Takashi Shibuya, Chieh-Hsin Lai

The Problem
Slow inference speeds in T2S models hinder real-time applications in creative workflows.

The Solution
SoundCTM combines score-based and consistency models, enabling flexible and efficient T2S generation.

The Challenge
Balancing speed with quality while reducing memory demands and ensuring adaptability.

The Result
SoundCTM accelerates sound generation with flexibility, offering unprecedented creative control.

Screenshot 2024-12-12 at 10.24.30 AM.png

Read the full research
View Samples

PRISM: Smarter Prompts for Text-to-Image

Authors: Yutong He, Alexander Robey, Naoki Murata, Yiding Jiang, Joshua Williams, George J. Pappas, Hamed Hassani, Yuki Mitsufuji, Ruslan Salakhutdinov, J. Zico Kolter

The Problem
Manual prompt engineering for T2I models is tedious, with limited transferability to closed-source systems.

The Solution
PRISM refines prompts iteratively using LLMs and feedback from generated and reference images.

The Challenge
Maintaining interpretability, transferability, and effectiveness across diverse T2I models.

The Result
PRISM creates accurate, interpretable prompts, making T2I workflows faster and more accessible.

Screenshot 2024-12-12 at 10.25.42 AM.png

Read the full research

Di4C: Accelerating Discrete Diffusion

Authors: Satoshi Hayakawa†∗ Yuhta Takida‡ Masaaki Imaizumi§ Hiromi Wakaki† Yuki Mitsufuji†‡

The Problem
Discrete diffusion models are slow, ignoring dimensional correlations that could streamline generation.

The Solution
Di4C distills multi-step processes into fewer steps, leveraging loss functions and mixture modeling.

The Challenge
Preserving accuracy while compressing processes and scaling for high-dimensional data.

The Result
Di4C boosts efficiency, cutting sampling times while maintaining high-quality outputs.

Read the full research paper

VRVQ: Smarter Audio Compression

Authors: Yunkee Chae1,3∗ Woosung Choi1 Yuhta Takida1 Junghyun Koo1 Yukara Ikemiya1 Zhi Zhong2 Kin Wai Cheuk1 Marco A. Martínez-Ramírez1 Kyogu Lee3,4,5 Wei-Hsiang Liao1 Yuki Mitsufuji1,2

The Problem
Constant bitrate audio compression wastes resources on simple inputs like silence.

The Solution
VRVQ introduces an “importance map” for dynamic bit allocation and efficient training.

The Challenge
Designing robust importance maps and handling masking functions within RVQ architectures.

The Result
VRVQ achieves better rate-distortion tradeoffs, setting a new standard for neural audio codecs.

Screenshot 2024-12-12 at 10.26.47 AM.png

Read the full research paper

Discovering Creative Behaviors through Reinforcement Learning (RL)

At NeurIPS 2024, Sony AI highlights the transformative potential of reinforcement learning (RL) through research designed to push beyond traditional limitations. A central focus is on advancing RL’s adaptability in dynamic and complex environments. These efforts explore how RL agents can balance diverse, context-specific strategies with optimal performance, unlocking new possibilities in applications like gaming, robotics, and beyond.

One key contribution, DUPLEX, introduces a diversity objective and successor features, enabling RL agents to discover multiple near-optimal policies rather than converging on single solutions. Demonstrated in Gran Turismo™ 7 and robotics tasks, DUPLEX sets a new benchmark for diversity-focused RL by achieving adaptive and effective behaviors in out-of-distribution scenarios. This work underscores Sony AI’s commitment to advancing RL methodologies for real-world challenges.

DUPLEX

Authors: Borja G. Leon, Francesco Riccio, Kaushik Subramanian, Pete Wurman, Peter Stone

The Problem
Reinforcement learning (RL) agents tend to settle on single solutions, limiting adaptability to diverse and dynamic environments.

The Solution
DUPLEX introduces a diversity objective and successor features, enabling agents to discover multiple near-optimal policies tailored to specific contexts.

The Challenge
Balancing the trade-off between diversity and optimality in highly dynamic or out-of-distribution (OOD) tasks like driving simulators and robotics.

The Result
DUPLEX achieved diverse driving styles in Gran Turismo™ 7 and effective policies in OOD robotics scenarios, setting a new benchmark for diversity-focused RL.

Screenshot 2024-12-12 at 10.28.13 AM.png

Read the research.

Privacy-Preserving Machine Learning

Sony AI’s research in privacy-preserving machine learning tackles critical challenges at the intersection of data security, utility, and compliance. As privacy regulations tighten and datasets become more sensitive, the need for robust frameworks to protect data while enabling meaningful AI applications has grown. This research introduces innovative solutions that address these complexities, including benchmarks for evaluating privacy-utility trade-offs, frameworks for federated learning across heterogeneous models, and tools for enhancing privacy in recommender systems and medical AI. Together, these advancements not only safeguard sensitive information but also set new standards for scalable, privacy-compliant AI development.

DECO-Bench: Unified Benchmark for Privacy-Preserving Data

Authors: Farzaneh Askari, Lingjuan Lyu, Vivek Sharma

The Problem
Sharing datasets containing sensitive information risks privacy breaches, and current decoupling methods lack a standardized benchmark to evaluate their effectiveness.

The Solution
DECO-Bench introduces a comprehensive framework to benchmark decoupling techniques, integrating synthetic data generation, structured data splits, and evaluation metrics for privacy-utility trade-offs.

The Challenge
Ensuring sensitive attributes are anonymized while maintaining non-sensitive data utility, balancing trade-offs in complex real-world datasets.

The Result
DECO-Bench enables systematic evaluation of decoupling methods, releasing datasets, pre-trained models, and metrics to accelerate privacy-preserving research and create a standard for task-agnostic synthetic data benchmarks.

Screenshot 2024-12-12 at 11.49.41 AM.png

Read the full research paper

pFedClub: Personalized Federated Learning at Scale

Authors: Jiaqi Wang, Qi Li, Lingjuan Lyu, Fenglong Ma

The Problem
Federated learning often requires uniform model structures across clients, limiting real-world applications where diverse client models and personalized needs are common.

The Solution
pFedClub introduces a novel block substitution and aggregation method, creating personalized models by clustering functional blocks from heterogeneous client models and selecting optimal configurations.

The Challenge
Balancing the creation of personalized models while managing computational costs, maintaining privacy, and ensuring compatibility with varying client resources.

The Result
pFedClub outperformed state-of-the-art methods in accuracy and efficiency, generating size-controllable personalized models suitable for diverse federated learning scenarios without compromising computational or communication resources.

Screenshot 2024-12-12 at 10.31.53 AM.png

Read the full research paper
The source code can be found at https://github.com/JackqqWang/24club

CURE4Rec: Benchmarking Privacy in Recommender Systems

Authors: Chaochao Chen, Jiaming Zhang, Yizhao Zhang, Li Zhang, Lingjuan Lyu, Yuyuan Li, Biao Gong, Chenggang Yan

The Problem
Current recommendation unlearning methods lack a unified evaluation framework, overlooking key aspects like fairness and efficiency while addressing privacy regulations like GDPR and the Delete Act.

The Solution
CURE4Rec provides a comprehensive benchmark to evaluate unlearning methods across four dimensions: completeness, utility, efficiency, and fairness, using varying data selection strategies such as core, edge, and random data.

The Challenge
Balancing unlearning completeness with computational efficiency, preserving utility for remaining data, and ensuring fairness in recommendations across diverse user groups.

The Result
CURE4Rec establishes a new standard for evaluating privacy-preserving recommender systems, enabling systematic improvement in unlearning methods while ensuring fairness and robustness against diverse data challenges.

Screenshot 2024-12-12 at 11.33.13 AM.png

Read the full research paper
Our code is available at https://github.com/xiye7lai/CURE4Rec

FEDMEKI: Advancing Privacy-Preserving Medical AI

Authors: Jiaqi Wang, Xiaochen Wang, Lingjuan Lyu, Jinghui Chen, Fenglong Ma

The Problem
Medical foundation models struggle to integrate knowledge from sensitive, decentralized datasets due to privacy regulations like HIPAA.

The Solution
FEDMEKI introduces a federated knowledge injection platform, enabling multi-modal, multi-task learning from distributed medical data without sharing private information.

The Challenge
Adapting decentralized client data into a unified foundation model while preserving privacy and ensuring scalability across diverse medical modalities.

The Result
FEDMEKI achieved state-of-the-art performance in privacy-preserving model scaling, supporting eight diverse medical tasks and demonstrating effective zero-shot generalization.

Screenshot 2024-12-12 at 12.01.51 PM.png

Read the full research paper
Code is available at https://github.com/psudslab/FEDMEKI

Research Contributions Across Our Leadership and Research Teams

Sony AI’s research achievements reflect the collective expertise of our leadership and researchers, many of whom bridge roles between industry and academia. This synergy enables groundbreaking advancements across fields such as reinforcement learning, skill discovery, and analog AI training. By tackling foundational challenges—like enabling multi-agent collaboration, uncovering meaningful skills in complex environments, and refining energy-efficient AI methods—our teams are driving innovation that addresses both theoretical and practical hurdles in AI. These contributions exemplify the breadth of our efforts to shape a more capable and adaptable future for artificial intelligence and showcase the capabilities of our diverse global team.

Consent in Crisis: The Rapid Decline of the AI Data Commons

The Problem
The internet has long served as the primary "data commons" for AI systems, with vast web datasets fueling advancements in general-purpose AI. However, the mechanisms for signaling data consent, such as robots.txt files and Terms of Service, were not designed with AI in mind. A growing number of web domains are imposing restrictions on data collection for AI training, leading to inconsistencies and limitations in available training data.

The Solution
This research conducted a large-scale, longitudinal audit of 14,000 web domains used in major AI corpora, such as C4, RefinedWeb, and Dolma. It examined the evolution of consent protocols, including robots.txt and Terms of Service, to diagnose the mismatch between traditional data use protocols and AI's reliance on web content.

The Challenge
Balancing the need for open data in AI development with emerging legal and ethical concerns. Restrictions are increasingly skewing data diversity, representativity, and freshness, jeopardizing scaling laws and the quality of AI training corpora.

The Result
The study found a significant rise in restrictive practices. In just one year, ~5% of tokens in major datasets became restricted by robots.txt, and nearly 45% carried restrictions in Terms of Service. These restrictions disproportionately impact high-quality, actively maintained web sources, threatening data availability for both commercial and non-commercial AI initiatives.

Read the full research paper
Additional Details

POAM: Dynamic Multi-Agent Collaboration

Authors: Caroline Wang, Arrasy Rahman, Ishan Durugkar, Elad Liebman, Peter Stone

The Problem
Existing MARL methods assume either full control or single-agent adaptation, failing to address scenarios with mixed controlled and uncontrolled agents.

The Solution
POAM uses an encoder-decoder network to model teammate behavior and dynamically adapt agent policies based on learned embeddings.

The Challenge
Balancing generalization to unseen teammates, managing diverse team compositions, and ensuring efficient learning in partially observable environments.

The Result
POAM outperformed baselines in adaptability and sample efficiency, excelling in tasks like StarCraft and predator-prey scenarios while generalizing to new teammates.

Screenshot 2024-12-12 at 11.35.54 AM.png

Read the full research paper
Code available here

SkiLD: Smarter Skill Discovery

Authors: Zizhao Wang, Jiaheng Hu, Caleb Chuck, Stephen Chen, Roberto Martin, Amy Zhang, Scott Niekum, Peter Stone

The Problem
Unsupervised skill discovery methods fail in complex environments, often producing overly simplistic skills that struggle with tasks requiring interactions between state factors.

The Solution
SkiLD introduces state-specific dependency graphs to guide skill discovery, encouraging diverse interactions between factors for meaningful downstream applications.

The Challenge
Balancing the complexity of learning interaction-driven skills with ensuring efficient exploration in vast state spaces.

The Result
SkiLD outperformed existing methods in complex environments, producing semantically rich skills that excelled in long-horizon tasks like robotic manipulation and household activities.

Screenshot 2024-12-12 at 11.36.46 AM.png

Read the full research paper
Code and visualizations

DUSDi: Disentangling Skills for Smarter Reinforcement Learning

Authors: Jiaheng Hu, Zizhao Wang, Roberto Martín-Martín, Peter Stone

The Problem
Existing skill discovery methods produce entangled skills, making it difficult for agents to solve complex tasks involving independent state factors.

The Solution
DUSDi introduces disentangled skills by mapping skill components to independent state factors, leveraging mutual information objectives to ensure minimal overlap between skill effects.

The Challenge
Balancing efficient learning of disentangled skills while addressing the complexity of environments with multiple state factors and ensuring effective downstream task performance.

The Result
DUSDi outperformed state-of-the-art methods in complex tasks, enabling agents to tackle multi-factor challenges efficiently through disentangled, reusable skills for hierarchical reinforcement learning.

Screenshot 2024-12-12 at 11.37.43 AM.png

Read the full research paper
Code and skills visualization

Tiki-Taka: Refining Analog AI Training

Authors: Zhaoxian Wu, Malte J. Rasch, Tayfun Gokmen, Tianyi Chen

The Problem
Analog in-memory computing accelerators reduce energy costs but face challenges with noise and asymmetric updates during gradient-based training, leading to suboptimal performance.

The Solution
Tiki-Taka introduces an auxiliary device to correct noise and asymmetric updates, ensuring convergence to critical points with improved accuracy.

The Challenge
Addressing the asymmetry in weight updates while managing noise effectively across analog accelerators without increasing computational costs.

The Result
Tiki-Taka eliminates asymptotic errors, outperforming Analog SGD with faster convergence and accuracy comparable to digital training, making analog AI training more viable.

Screenshot 2024-12-12 at 11.38.49 AM.png

The digital algorithm, including SGD, and dataset used in this paper, including MNIST and CIFAR10, are provided by PyTorch, which has a BSD license; see https://github.com/pytorch/pytorch
Simulate the behaviors of Analog SGD; see github.com/IBM/aihwkit
Read the full research paper

Conclusion

NeurIPS 2024 highlights the rapid pace of AI advancements and the challenges that come with them. Sony AI's contributions this year are a testament to our commitment to pushing boundaries while addressing pressing concerns like security, efficiency, and creativity. Whether it’s securing generative models, accelerating diffusion processes, or pioneering smarter multi-agent systems, we aim to inspire and drive progress across the AI landscape. Join us at NeurIPS to explore these ideas further and help shape the future of artificial intelligence.

Latest Blog

December 13, 2024 | Sony AI

Sights on AI: Alice Xiang Discusses the Ever-Changing Nature of AI Ethics and So…

The Sony AI team is a diverse group of individuals working to accomplish one common goal: accelerate the fundamental research and development of AI and enhance human imagination an…

December 4, 2024 | AI Ethics

Exploring the Challenges of Fair Dataset Curation: Insights from NeurIPS 2024

Sony AI’s paper accepted at NeurIPS 2024, "A Taxonomy of Challenges to Curating Fair Datasets," highlights the pivotal steps toward achieving fairness in machine learning and is a …

November 15, 2024 | Sony AI

Breaking New Ground in AI Image Generation Research: GenWarp and PaGoDA at NeurI…

At NeurIPS 2024, Sony AI is set to showcase two new research explorations into methods for image generation: GenWarp and PaGoDA. These two research papers highlight advancements in…

  • HOME
  • Blog
  • From Data Fairness to 3D Image Generation: Sony AI at NeurIPS 2024

JOIN US

Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.