Authors

* External authors

Venue

Date

Share

Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement

Naoki Murata

Yuhta Takida

Toshimitsu Uesaka

Takashi Shibuya

Shusuke Takahashi*

Yuki Mitsufuji

* External authors

Interspeech '23

2023

Abstract

Although deep neural network (DNN)-based speech enhancement (SE) methods outperform the previous non-DNN-based ones, they often degrade the perceptual quality of generated outputs. To tackle this problem, we introduce a DNN-based generative refiner, Diffiner, aiming to improve perceptual speech quality pre-processed by an SE method. We train a diffusion-based generative model by utilizing a dataset consisting of clean speech only. Then, our refiner effectively mixes clean parts newly generated via denoising diffusion restoration into the degraded and distorted parts caused by a preceding SE method, resulting in refined speech. Once our refiner is trained on a set of clean speech, it can be applied to various SE methods without additional training specialized for each SE module. Therefore, our refiner can be a versatile post-processing module w.r.t. SE methods and has high potential in terms of modularity. Experimental results show that our method improved perceptual speech quality regardless of the preceding SE methods used.

Related Publications

Weighted Point Cloud Embedding for Multimodal Contrastive Learning Toward Optimal Similarity Metric

ICLR, 2025
Toshimitsu Uesaka, Taiji Suzuki, Yuhta Takida, Chieh-Hsin Lai, Naoki Murata, Yuki Mitsufuji

In typical multimodal contrastive learning, such as CLIP, encoders produce onepoint in the latent representation space for each input. However, one-point representation has difficulty in capturing the relationship and the similarity structure of a huge amount of instances in…

Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning

ICLR, 2025
Shang-Fu Chen, Chieh-Hsin Lai, Dongjun Kim*, Naoki Murata, Takashi Shibuya, Wei-Hsiang Liao, Shao-Hua Sun, Yuki Mitsufuji, Ayano Hiranaka

Controllable generation through Stable Diffusion (SD) fine-tuning aims to improve fidelity, safety, and alignment with human guidance. Existing reinforcement learning from human feedback methods usually rely on predefined heuristic reward functions or pretrained reward model…

Mining your own secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models

ICLR, 2025
Saurav Jha, Shiqi Yang*, Masato Ishii, Mengjie Zhao*, Christian Simon, Muhammad Jehanzeb Mirza, Dong Gong, Lina Yao, Shusuke Takahashi*, Yuki Mitsufuji

Personalized text-to-image diffusion models have grown popular for their ability to efficiently acquire a new concept from user-defined text descriptions and a few images. However, in the real world, a user may wish to personalize a model on multiple concepts but one at a ti…

  • HOME
  • Publications
  • Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement

JOIN US

Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.