Authors

* External authors

Venue

Date

Share

VRDMG: Vocal Restoration via Diffusion Posterior Sampling with Multiple Guidance

Carlos Hernandez-Olivan*

Koichi Saito

Naoki Murata

Chieh-Hsin Lai

Marco A. Martínez-Ramírez

Wei-Hsiang Liao

Yuki Mitsufuji

* External authors

ICASSP 2024

2023

Abstract

Restoring degraded music signals is essential to enhance audio quality for downstream music manipulation. Recent diffusion-based music restoration methods have demonstrated impressive performance, and among them, diffusion posterior sampling (DPS) stands out given its intrinsic properties, making it versatile across various restoration tasks. In this paper, we identify that there are potential issues which will degrade current DPS-based methods' performance and introduce the way to mitigate the issues inspired by diverse diffusion guidance techniques including the RePaint (RP) strategy and the Pseudoinverse-Guided Diffusion Models (ΠGDM). We demonstrate our methods for the vocal declipping and bandwidth extension tasks under various levels of distortion and cutoff frequency, respectively. In both tasks, our methods outperform the current DPS-based music restoration benchmarks. We refer to \url{this http URL} for examples of the restored audio samples.

Related Publications

Music Arena: Live Evaluation for Text-to-Music

NeurIPS, 2025
Yonghyun Kim, Wayne Chi, Anastasios N. Angelopoulos, Wei-Lin Chiang, Koichi Saito, Shinji Watanabe, Yuki Mitsufuji, Chris Donahue

We present Music Arena, an open platform for scalable human preference evaluation of text-to-music (TTM) models. Soliciting human preferences via listening studies is the gold standard for evaluation in TTM, but these studies are expensive to conduct and difficult to compare…

Large-Scale Training Data Attribution for Music Generative Models via Unlearning

NeurIPS, 2025
Woosung Choi, Junghyun Koo*, Kin Wai Cheuk, Joan Serrà, Marco A. Martínez-Ramírez, Yukara Ikemiya, Naoki Murata, Yuhta Takida, Wei-Hsiang Liao, Yuki Mitsufuji

This paper explores the use of unlearning methods for training data attribution (TDA) in music generative models trained on large-scale datasets. TDA aims to identify which specific training data points contributed to the generation of a particular output from a specific mod…

Blind Inverse Problem Solving Made Easy by Text-to-Image Latent Diffusion

NeurIPS, 2025
Michail Dontas, Yutong He, Naoki Murata, Yuki Mitsufuji, J. Zico Kolter*, Ruslan Salakhutdinov*

Blind inverse problems, where both the target data and forward operator are unknown, are crucial to many computer vision applications. Existing methods often depend on restrictive assumptions such as additional training, operator linearity, or narrow image distributions, thu…

  • HOME
  • Publications
  • VRDMG: Vocal Restoration via Diffusion Posterior Sampling with Multiple Guidance

JOIN US

Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.