Authors

* External authors

Venue

Date

Share

FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation

Chieh-Hsin Lai

Yuhta Takida

Naoki Murata

Toshimitsu Uesaka

Yuki Mitsufuji

Stefano Ermon*

* External authors

ICML 2023

2023

Abstract

Score-based generative models learn a family of noise-conditional score functions corresponding to the data density perturbed with increasingly large amounts of noise. These perturbed data densities are tied together by the Fokker-Planck equation (FPE), a partial differential equation (PDE) governing the spatial-temporal evolution of a density undergoing a diffusion process. In this work, we derive a corresponding equation, called the score FPE that characterizes the noise-conditional scores of the perturbed data densities (i.e., their gradients). Surprisingly, despite impressive empirical performance, we observe that scores learned via denoising score matching (DSM) do not satisfy the underlying score FPE. We prove that satisfying the FPE is desirable as it improves the likelihood and the degree of conservativity. Hence, we propose to regularize the DSM objective to enforce satisfaction of the score FPE, and we show the effectiveness of this approach across various datasets.

Related Publications

Classifier-Free Guidance inside the Attraction Basin May Cause Memorization

CVPR, 2025
Anubhav Jain, Yuya Kobayashi, Takashi Shibuya, Yuhta Takida, Nasir Memon, Julian Togelius, Yuki Mitsufuji

Diffusion models are prone to exactly reproduce images from the training data. This exact reproduction of the training data is concerning as it can lead to copyright infringement and/or leakage of privacy-sensitive information. In this paper, we present a novel way to unders…

MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

CVPR, 2025
Ho Kei Cheng, Masato Ishii, Akio Hayakawa, Takashi Shibuya, Alexander Schwing, Yuki Mitsufuji

We propose to synthesize high-quality and synchronized audio, given video and optional text conditions, using a novel multimodal joint training framework MMAudio. In contrast to single-modality training conditioned on (limited) video data only, MMAudio is jointly trained wit…

VRVQ: Variable Bitrate Residual Vector Quantization for Audio Compression

ICASSP, 2025
Yunkee Chae, Woosung Choi, Yuhta Takida, Junghyun Koo*, Yukara Ikemiya, Zhi Zhong*, Kin Wai Cheuk, Marco A. Martínez-Ramírez, Kyogu Lee*, Wei-Hsiang Liao, Yuki Mitsufuji

Recent state-of-the-art neural audio compression models have progressively adopted residual vector quantization (RVQ). Despite this success, these models employ a fixed number of codebooks per frame, which can be suboptimal in terms of rate-distortion tradeoff, particularly …

  • HOME
  • Publications
  • FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation

JOIN US

Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.