Authors

* External authors

Venue

Date

Share

Unsupervised vocal dereverberation with diffusion-based generative models

Koichi Saito

Naoki Murata

Toshimitsu Uesaka

Chieh-Hsin Lai

Yuhta Takida

Takao Fukui*

Yuki Mitsufuji

* External authors

ICASSP 2023

2023

Abstract

Removing reverb from reverberant music is a necessary technique to clean up audio for downstream music manipulations. Reverberation of music contains two categories, natural reverb, and artificial reverb. Artificial reverb has a wider diversity than natural reverb due to its various parameter setups and reverberation types. However, recent supervised dereverberation methods may fail because they rely on sufficiently diverse and numerous pairs of reverberant observations and retrieved data for training in order to be generalizable to unseen observations during inference. To resolve these problems, we propose an unsupervised method that can remove a general kind of artificial reverb for music without requiring pairs of data for training. The proposed method is based on diffusion models, where it initializes the unknown reverberation operator with a conventional signal processing technique and simultaneously refines the estimate with the help of diffusion models. We show through objective and perceptual evaluations that our method outperforms the current leading vocal dereverberation benchmarks.

Related Publications

PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher

NeurIPS, 2024
Dongjun Kim*, Chieh-Hsin Lai, Wei-Hsiang Liao, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon*

To accelerate sampling, diffusion models (DMs) are often distilled into generators that directly map noise to data in a single step. In this approach, the resolution of the generator is fundamentally limited by that of the teacher DM. To overcome this limitation, we propose …

GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping

NeurIPS, 2024
Junyoung Seo, Kazumi Fukuda, Takashi Shibuya, Takuya Narihira, Naoki Murata, Shoukang Hu, Chieh-Hsin Lai, Seungryong Kim*, Yuki Mitsufuji

Generating novel views from a single image remains a challenging task due to the complexity of 3D scenes and the limited diversity in the existing multi-view datasets to train a model on. Recent research combining large-scale text-to-image (T2I) models with monocular depth e…

The whole is greater than the sum of its parts: improving music source separation by bridging networks

EURASIP, 2024
Ryosuke Sawata, Naoya Takahashi, Stefan Uhlich*, Shusuke Takahashi*, Yuki Mitsufuji

This paper presents the crossing scheme (X-scheme) for improving the performance of deep neural network (DNN)-based music source separation (MSS) with almost no increasing calculation cost. It consists of three components: (i) multi-domain loss (MDL), (ii) bridging operation…

  • HOME
  • Publications
  • Unsupervised vocal dereverberation with diffusion-based generative models

JOIN US

Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.