* External authors




Unsupervised vocal dereverberation with diffusion-based generative models

Koichi Saito

Naoki Murata

Toshimitsu Uesaka

Chieh-Hsin Lai

Yuhta Takida

Takao Fukui*

Yuki Mitsufuji

* External authors




Removing reverb from reverberant music is a necessary technique to clean up audio for downstream music manipulations. Reverberation of music contains two categories, natural reverb, and artificial reverb. Artificial reverb has a wider diversity than natural reverb due to its various parameter setups and reverberation types. However, recent supervised dereverberation methods may fail because they rely on sufficiently diverse and numerous pairs of reverberant observations and retrieved data for training in order to be generalizable to unseen observations during inference. To resolve these problems, we propose an unsupervised method that can remove a general kind of artificial reverb for music without requiring pairs of data for training. The proposed method is based on diffusion models, where it initializes the unknown reverberation operator with a conventional signal processing technique and simultaneously refines the estimate with the help of diffusion models. We show through objective and perceptual evaluations that our method outperforms the current leading vocal dereverberation benchmarks.

Related Publications

SilentCipher: Deep Audio Watermarking

Interspeech, 2024
Mayank Kumar Singh*, Naoya Takahashi, Weihsiang Liao, Yuki Mitsufuji

In the realm of audio watermarking, it is challenging to simultaneously encode imperceptible messages while enhancing the message capacity and robustness. Although recent advancements in deep learning-based methods bolster the message capacity and robustness over traditional…

BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network

ICASSP, 2024
Takashi Shibuya, Yuhta Takida, Yuki Mitsufuji

Generative adversarial network (GAN)-based vocoders have been intensively studied because they can synthesize high-fidelity audio waveforms faster than real-time. However, it has been reported that most GANs fail to obtain the optimal projection for discriminating between re…

HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes

TMLR, 2024
Yuhta Takida, Yukara Ikemiya, Takashi Shibuya, Kazuki Shimada, Woosung Choi, Chieh-Hsin Lai, Naoki Murata, Toshimitsu Uesaka, Kengo Uchida, Wei-Hsiang Liao, Yuki Mitsufuji

Vector quantization (VQ) is a technique to deterministically learn features with discrete codebook representations. It is commonly performed with a variational autoencoding model, VQ-VAE, which can be further extended to hierarchical structures for making high-fidelity recon…

  • HOME
  • Publications
  • Unsupervised vocal dereverberation with diffusion-based generative models


Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.