Authors

* External authors

Venue

Date

Share

Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement

Naoki Murata

Yuhta Takida

Toshimitsu Uesaka

Takashi Shibuya

Shusuke Takahashi*

Yuki Mitsufuji

* External authors

Interspeech '23

2023

Abstract

Although deep neural network (DNN)-based speech enhancement (SE) methods outperform the previous non-DNN-based ones, they often degrade the perceptual quality of generated outputs. To tackle this problem, we introduce a DNN-based generative refiner, Diffiner, aiming to improve perceptual speech quality pre-processed by an SE method. We train a diffusion-based generative model by utilizing a dataset consisting of clean speech only. Then, our refiner effectively mixes clean parts newly generated via denoising diffusion restoration into the degraded and distorted parts caused by a preceding SE method, resulting in refined speech. Once our refiner is trained on a set of clean speech, it can be applied to various SE methods without additional training specialized for each SE module. Therefore, our refiner can be a versatile post-processing module w.r.t. SE methods and has high potential in terms of modularity. Experimental results show that our method improved perceptual speech quality regardless of the preceding SE methods used.

Related Publications

Music Arena: Live Evaluation for Text-to-Music

NeurIPS, 2025
Yonghyun Kim, Wayne Chi, Anastasios N. Angelopoulos, Wei-Lin Chiang, Koichi Saito, Shinji Watanabe, Yuki Mitsufuji, Chris Donahue

We present Music Arena, an open platform for scalable human preference evaluation of text-to-music (TTM) models. Soliciting human preferences via listening studies is the gold standard for evaluation in TTM, but these studies are expensive to conduct and difficult to compare…

Large-Scale Training Data Attribution for Music Generative Models via Unlearning

NeurIPS, 2025
Woosung Choi, Junghyun Koo*, Kin Wai Cheuk, Joan Serrà, Marco A. Martínez-Ramírez, Yukara Ikemiya, Naoki Murata, Yuhta Takida, Wei-Hsiang Liao, Yuki Mitsufuji

This paper explores the use of unlearning methods for training data attribution (TDA) in music generative models trained on large-scale datasets. TDA aims to identify which specific training data points contributed to the generation of a particular output from a specific mod…

Blind Inverse Problem Solving Made Easy by Text-to-Image Latent Diffusion

NeurIPS, 2025
Michail Dontas, Yutong He, Naoki Murata, Yuki Mitsufuji, J. Zico Kolter*, Ruslan Salakhutdinov*

Blind inverse problems, where both the target data and forward operator are unknown, are crucial to many computer vision applications. Existing methods often depend on restrictive assumptions such as additional training, operator linearity, or narrow image distributions, thu…

  • HOME
  • Publications
  • Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement

JOIN US

Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.