Chieh-Hsin
Lai

Profile

Since 2021, Jesse has worked as a research scientist at Sony AI, focusing on robustness, deep generative models, and theoretical deep learning. Prior to Sony AI, he worked as a Research Assistant at the Institute of Mathematics Academia Sinica. Jesse earned his PhD in Mathematics from the University of Minnesota - Twin Cities.

Message

“I specialize in developing theoretical grounded deep generative models that excel in producing high-fidelity samples, rapid sampling, ease of training, and controllable generation. I expect to unlock the black-box nature of deep generative modeling through the application of advanced mathematical tools. With a focus on innovation and precision, I am dedicated to pushing the boundaries of artificial intelligence and contributing to the advancement of cutting-edge technology.”

Publications

On the Language Encoder of Contrastive Cross-modal Models

ACL, 2024
Mengjie Zhao*, Junya Ono*, Zhi Zhong*, Chieh-Hsin Lai, Yuhta Takida, Naoki Murata, Takashi Shibuya, Hiromi Wakaki*, Yuki Mitsufuji, Wei-Hsiang Liao

Contrastive cross-modal models such as CLIP and CLAP aid various vision-language (VL) and audio-language (AL) tasks. However, there has been limited investigation of and improvement in their language encoder, which is the central component of encoding natural language descri…

HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes

TMLR, 2024
Yuhta Takida, Yukara Ikemiya, Takashi Shibuya, Kazuki Shimada, Woosung Choi, Chieh-Hsin Lai, Naoki Murata, Toshimitsu Uesaka, Kengo Uchida, Yuki Mitsufuji, Wei-Hsiang Liao

Vector quantization (VQ) is a technique to deterministically learn features with discrete codebook representations. It is commonly performed with a variational autoencoding model, VQ-VAE, which can be further extended to hierarchical structures for making high-fidelity recon…

VRDMG: Vocal Restoration via Diffusion Posterior Sampling with Multiple Guidance

ICASSP, 2024
Carlos Hernandez-Olivan*, Koichi Saito, Naoki Murata, Chieh-Hsin Lai, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Yuki Mitsufuji

Restoring degraded music signals is essential to enhance audio quality for downstream music manipulation. Recent diffusion-based music restoration methods have demonstrated impressive performance, and among them, diffusion posterior sampling (DPS) stands out given its intrin…

SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer

ICLR, 2024
Yuhta Takida, Masaaki Imaizumi*, Takashi Shibuya, Chieh-Hsin Lai, Toshimitsu Uesaka, Naoki Murata, Yuki Mitsufuji

Generative adversarial networks (GANs) learn a target probability distribution by optimizing a generator and a discriminator with minimax objectives. This paper addresses the question of whether such optimization actually provides the generator with gradients that make its d…

Manifold Preserving Guided Diffusion

ICLR, 2024
Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun Kim, Wei-Hsiang Liao, Yuki Mitsufuji, J. Zico Kolter*, Ruslan Salakhutdinov*, Stefano Ermon*

Despite the recent advancements, conditional image generation still faces challenges of cost, generalizability, and the need for task-specific training. In this paper, we propose Manifold Preserving Guided Diffusion (MPGD), a training-free conditional generation framework th…

Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion

ICLR, 2024
Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, Stefano Ermon*

Consistency Models (CM) (Song et al., 2023) accelerate score-based diffusion model sampling at the cost of sample quality but lack a natural way to trade-off quality for speed. To address this limitation, we propose Consistency Trajectory Model (CTM), a generalization encomp…

VRDMG: Vocal Restoration via Diffusion Posterior Sampling with Multiple Guidance

ICASSP, 2023
Carlos Hernandez-Olivan*, Koichi Saito, Naoki Murata, Chieh-Hsin Lai, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Yuki Mitsufuji

Restoring degraded music signals is essential to enhance audio quality for downstream music manipulation. Recent diffusion-based music restoration methods have demonstrated impressive performance, and among them, diffusion posterior sampling (DPS) stands out given its intrin…

GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration

ICML, 2023
Naoki Murata, Koichi Saito, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Yuki Mitsufuji

Pre-trained diffusion models have been successfully used as priors in a variety of linear inverse problems, where the goal is to reconstruct a signal from noisy linear measurements. However, existing approaches require knowledge of the linear operator. In this paper, we prop…

FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation

ICML, 2023
Chieh-Hsin Lai, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon*

Score-based generative models learn a family of noise-conditional score functions corresponding to the data density perturbed with increasingly large amounts of noise. These perturbed data densities are tied together by the Fokker-Planck equation (FPE), a partial differentia…

Unsupervised vocal dereverberation with diffusion-based generative models

ICASSP, 2023
Koichi Saito, Naoki Murata, Toshimitsu Uesaka, Chieh-Hsin Lai, Yuhta Takida, Takao Fukui*, Yuki Mitsufuji

Removing reverb from reverberant music is a necessary technique to clean up audio for downstream music manipulations. Reverberation of music contains two categories, natural reverb, and artificial reverb. Artificial reverb has a wider diversity than natural reverb due to its…

SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization

ICML, 2022
Yuhta Takida, Takashi Shibuya, Wei-Hsiang Liao, Chieh-Hsin Lai, Junki Ohmura*, Toshimitsu Uesaka, Naoki Murata, Shusuke Takahashi*, Toshiyuki Kumakura*, Yuki Mitsufuji

One noted issue of vector-quantized variational autoencoder (VQ-VAE) is that the learned discrete representation uses only a fraction of the full capacity of the codebook, also known as codebook collapse. We hypothesize that the training scheme of VQ-VAE, which involves some…

Blog

May 10, 2024 | Events | Sony AI

Revolutionizing Creativity with CTM and SAN: Sony AI's Groundbreaking Advances in Generative AI for Creators

In the dynamic world of generative AI, the quest for more efficient, versatile, and high-quality models continues to push forward without any reduction in intensity. At the forefront of this technological evolution are Sony AI's r…

In the dynamic world of generative AI, the quest for more efficient, versatile, and high-quality models continues to push forward …

JOIN US

Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.