Authors

Venue

Date

Share

Classifier-Free Guidance inside the Attraction Basin May Cause Memorization

Anubhav Jain

Yuya Kobayashi

Takashi Shibuya

Yuhta Takida

Nasir Memon

Julian Togelius

Yuki Mitsufuji

CVPR-25

2025

Abstract

Diffusion models are prone to exactly reproduce images from the training data. This exact reproduction of the training data is concerning as it can lead to copyright infringement and/or leakage of privacy-sensitive information. In this paper, we present a novel way to understand the memorization phenomenon, and propose a simple yet effective approach to mitigate it. We argue that memorization occurs because of an attraction basin in the denoising process which steers the diffusion trajectory towards a memorized image. However, this can be mitigated by guiding the diffusion trajectory away from the attraction basin by not applying classifier-free guidance until an ideal transition point occurs from which classifier-free guidance is applied. This leads to the generation of non-memorized images that are high in image quality and well-aligned with the conditioning mechanism. To further improve on this, we present a new guidance technique, \emph{opposite guidance}, that escapes the attraction basin sooner in the denoising process. We demonstrate the existence of attraction basins in various scenarios in which memorization occurs, and we show that our proposed approach successfully mitigates memorization.

Related Publications

MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

CVPR, 2025
Ho Kei Cheng, Masato Ishii, Akio Hayakawa, Takashi Shibuya, Alexander Schwing, Yuki Mitsufuji

We propose to synthesize high-quality and synchronized audio, given video and optional text conditions, using a novel multimodal joint training framework MMAudio. In contrast to single-modality training conditioned on (limited) video data only, MMAudio is jointly trained wit…

VRVQ: Variable Bitrate Residual Vector Quantization for Audio Compression

ICASSP, 2025
Yunkee Chae, Woosung Choi, Yuhta Takida, Junghyun Koo*, Yukara Ikemiya, Zhi Zhong*, Kin Wai Cheuk, Marco A. Martínez-Ramírez, Kyogu Lee*, Wei-Hsiang Liao, Yuki Mitsufuji

Recent state-of-the-art neural audio compression models have progressively adopted residual vector quantization (RVQ). Despite this success, these models employ a fixed number of codebooks per frame, which can be suboptimal in terms of rate-distortion tradeoff, particularly …

Latent Diffusion Bridges for Unsupervised Musical Audio Timbre Transfer

ICASSP, 2025
Michele Mancusi, Yurii Halychanskyi, Kin Wai Cheuk, Eloi Moliner, Chieh-Hsin Lai, Stefan Uhlich*, Junghyun Koo*, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Giorgio Fabbro*, Yuki Mitsufuji

Music timbre transfer is a challenging task that involves modifying the timbral characteristics of an audio signal while preserving its melodic structure. In this paper, we propose a novel method based on dual diffusion bridges, trained using the CocoChorales Dataset, which …

  • HOME
  • Publications
  • Classifier-Free Guidance inside the Attraction Basin May Cause Memorization

JOIN US

Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.