Kengo Uchida
Publications
MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training
CVPR, 2025 | Kengo Uchida, Takashi Shibuya, Yuhta Takida, Naoki Murata, Julian Tanke, Shusuke Takahashi*, Yuki Mitsufuji
In text-to-motion generation, controllability as well as generation quality and speed has become increasingly critical. The controllability challenges include generating a motion of a length that matches the given textual description and editing the generated motions accordi...
HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes
TMLR, 2024 | Yuhta Takida, Yukara Ikemiya, Takashi Shibuya, Kazuki Shimada, Woosung Choi, Chieh-Hsin Lai, Naoki Murata, Toshimitsu Uesaka, Kengo Uchida, Yuki Mitsufuji, Wei-Hsiang Liao
Vector quantization (VQ) is a technique to deterministically learn features with discrete codebook representations. It is commonly performed with a variational autoencoding model, VQ-VAE, which can be further extended to hierarchical structures for making high-fidelity recon...
Zero- and Few-shot Sound Event Localization and Detection
ICASSP, 2024 | Kazuki Shimada, Kengo Uchida, Yuichiro Koyama*, Takashi Shibuya, Shusuke Takahashi*, Yuki Mitsufuji, Tatsuya Kawahara*
Sound event localization and detection (SELD) systems estimate direction-of-arrival (DOA) and temporal activation for sets of target classes. Neural network (NN)-based SELD systems have performed well in various sets of target classes, but they only output the DOA and tempor...
STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
NEURIPS, 2023 | Kazuki Shimada, Archontis Politis*, Parthasaarathy Sudarsanam*, Daniel Krause*, Kengo Uchida, Sharath Adavann*, Aapo Hakala*, Yuichiro Koyama*, Naoya Takahashi, Shusuke Takahashi*, Tuomas Virtanen*, Yuki Mitsufuji
While direction of arrival (DOA) of sound events is generally estimated from multichannel audio data recorded in a microphone array, sound events usually derive from visually perceptible source objects, e.g., sounds of footsteps come from the feet of a walker. This paper pro...