* External authors




SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization

Yuhta Takida

Takashi Shibuya

Wei-Hsiang Liao

Chieh-Hsin Lai

Junki Ohmura*

Toshimitsu Uesaka

Naoki Murata

Shusuke Takahashi*

Toshiyuki Kumakura*

Yuki Mitsufuji

* External authors

ICML 2022



One noted issue of vector-quantized variational autoencoder (VQ-VAE) is that the learned discrete representation uses only a fraction of the full capacity of the codebook, also known as codebook collapse. We hypothesize that the training scheme of VQ-VAE, which involves some carefully designed heuristics, underlies this issue. In this paper, we propose a new training scheme that extends the standard VAE via novel stochastic dequantization and quantization, called stochastically quantized variational autoencoder (SQ-VAE). In SQ-VAE, we observe a trend that the quantization is stochastic at the initial stage of the training but gradually converges toward a deterministic quantization, which we call self-annealing. Our experiments show that SQ-VAE improves codebook utilization without using common heuristics. Furthermore, we empirically show that SQ-VAE is superior to VAE and VQ-VAE in vision- and speech-related tasks.

Related Publications

SilentCipher: Deep Audio Watermarking

Interspeech, 2024
Mayank Kumar Singh*, Naoya Takahashi, Weihsiang Liao, Yuki Mitsufuji

In the realm of audio watermarking, it is challenging to simultaneously encode imperceptible messages while enhancing the message capacity and robustness. Although recent advancements in deep learning-based methods bolster the message capacity and robustness over traditional…

BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network

ICASSP, 2024
Takashi Shibuya, Yuhta Takida, Yuki Mitsufuji

Generative adversarial network (GAN)-based vocoders have been intensively studied because they can synthesize high-fidelity audio waveforms faster than real-time. However, it has been reported that most GANs fail to obtain the optimal projection for discriminating between re…

HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes

TMLR, 2024
Yuhta Takida, Yukara Ikemiya, Takashi Shibuya, Kazuki Shimada, Woosung Choi, Chieh-Hsin Lai, Naoki Murata, Toshimitsu Uesaka, Kengo Uchida, Wei-Hsiang Liao, Yuki Mitsufuji

Vector quantization (VQ) is a technique to deterministically learn features with discrete codebook representations. It is commonly performed with a variational autoencoding model, VQ-VAE, which can be further extended to hierarchical structures for making high-fidelity recon…

  • HOME
  • Publications
  • SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization


Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.