Junghyun Koo – Sony AI

Profile

Junghyun (Tony) Koo is a research scientist on the AI for Creators team at Sony AI. He received his Ph.D. from Seoul National University in South Korea, with a dissertation focused on applying deep neural networks for style transfer of audio effects, particularly in music post-production tasks such as mixing and mastering. During his Ph.D., Tony gained industry experience through research internships at Mitsubishi Electric Research Laboratories (MERL), Sony Tokyo R&D Center, and Supertone. He holds a Bachelor of Science in Information and Communication Engineering from Inha University in South Korea.

Publications

LLM2Fx-Tools: Tool Calling For Music Post-Production

ICLR, 2026 | Seungheon Doh*, Junghyun Koo, Marco A. Martínez-Ramírez, Woosung Choi, Wei-Hsiang Liao, Qiyu Wu*, Juhan Nam*, Yuki Mitsufuji

This paper introduces LLM2Fx-Tools, a multimodal tool-calling framework that generates executable sequences of audio effects (Fx-chain) for music post-production. LLM2Fx-Tools uses a large language model (LLM) to understand audio inputs, select audio effects types, determine...

Read Now

Concept-TRAK: Understanding How Diffusion Models Learn Concepts through Concept-Level Attribution

ICLR, 2026 | Yonghyun Park*, Chieh-Hsin Lai, Satoshi Hayakawa*, Yuhta Takida, Naoki Murata, Wei-Hsiang Liao, Woosung Choi, Kin Wai Cheuk, Junghyun Koo, Yuki Mitsufuji

While diffusion models excel at image generation, their growing adoption raises critical concerns around copyright issues and model transparency. Existing attribution methods identify training examples influencing an entire image, but fall short in isolating contributions to...

Read Now

Automatic Music Mixing Using a Generative Model of Effect Embeddings

ICASSP, 2026 | Eloi Moliner, Marco A. Martínez-Ramírez, Junghyun Koo, Wei-Hsiang Liao, Kin Wai Cheuk, Joan Serrà, Vesa Välimäki*, Yuki Mitsufuji

Music mixing involves combining individual tracks into a cohesive mixture, a task characterized by subjectivity where multiple valid solutions exist for the same input. Existing automatic mixing systems treat this task as a deterministic regression problem, thus ignoring thi...

Read Now

Towards Blind Data Cleaning: A Case Study in Music Source Separation

ICASSP, 2026 | Azalea Gui, Woosung Choi, Junghyun Koo, Kazuki Shimada, Takashi Shibuya, Joan Serrà, Wei-Hsiang Liao, Yuki Mitsufuji

The performance of deep learning models for music source separation heavily depends on training data quality. However, datasets are often corrupted by difficult-to-detect artifacts such as audio bleeding and label noise. Since the type and extent of contamination are typical...

Read Now

Large-Scale Training Data Attribution for Music Generative Models via Unlearning

NEURIPS, 2025 | Woosung Choi, Junghyun Koo, Kin Wai Cheuk, Joan Serrà, Marco A. Martínez-Ramírez, Yukara Ikemiya, Naoki Murata, Yuhta Takida, Wei-Hsiang Liao, Yuki Mitsufuji

This paper explores the use of unlearning methods for training data attribution (TDA) in music generative models trained on large-scale datasets. TDA aims to identify which specific training data points contributed to the generation of a particular output from a specific mod...

Read Now

DiffVox: A Differentiable Model for Capturing and Analysing Professional Effects Distributions

DAFX, 2025 | Chin-Yun Yu, Marco A. Martínez-Ramírez, Junghyun Koo, Ben Hayes, Wei-Hsiang Liao, György Fazekas, Yuki Mitsufuji

This study introduces a novel and interpretable model, DiffVox, for matching vocal effects in music production. DiffVox, short for ``Differentiable Vocal Fx", integrates parametric equalisation, dynamic range control, delay, and reverb with efficient differentiable implement...

Read Now

Improving Inference-Time Optimisation for Vocal Effects Style Transfer with a Gaussian Prior

WASPAA, 2025 | Chin-Yun Yu, Marco A. Martínez-Ramírez, Junghyun Koo, Wei-Hsiang Liao, Yuki Mitsufuji, György Fazekas

Style Transfer with Inference-Time Optimisation (ST-ITO) is a recent approach for transferring the applied effects of a reference audio to a raw audio track. It optimises the effect parameters to minimise the distance between the style embeddings of the processed audio and t...

Read Now

Can Large Language Models Predict Audio Effects Parameters from Natural Language?

WASPAA, 2025 | Seungheon Doh*, Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Juhan Nam*, Yuki Mitsufuji

In music production, manipulating audio effects (Fx) parameters through natural language has the potential to reduce technical barriers for non-experts. We present LLM2Fx, a framework leveraging Large Language Models (LLMs) to predict Fx parameters directly from textual desc...

Read Now

Fx-Encoder++: Extracting Instrument-Wise Audio Effects Representations from Mixtures

ISMIR, 2025 | Yen-Tung Yeh, Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Yi-Hsuan Yang, Yuki Mitsufuji

General-purpose audio representations have proven effective across diverse music information retrieval applications, yet their utility in intelligent music production remains limited by insufficient understanding of audio effects (Fx). Although previous approaches have empha...

Read Now

ITO-Master: Inference-Time Optimization for Audio Effects Modeling of Music Mastering Processors

ISMIR, 2025 | Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Giorgio Fabbro*, Michele Mancusi, Yuki Mitsufuji

Music mastering style transfer aims to model and apply the mastering characteristics of a reference track to a target track, simulating the professional mastering process. However, existing methods apply fixed processing based on a reference track, limiting users' ability to...

Read Now

VRVQ: Variable Bitrate Residual Vector Quantization for Audio Compression

ICASSP, 2025 | Yunkee Chae, Woosung Choi, Yuhta Takida, Junghyun Koo, Yukara Ikemiya, Zhi Zhong*, Kin Wai Cheuk, Marco A. Martínez-Ramírez, Kyogu Lee*, Wei-Hsiang Liao, Yuki Mitsufuji

Recent state-of-the-art neural audio compression models have progressively adopted residual vector quantization (RVQ). Despite this success, these models employ a fixed number of codebooks per frame, which can be suboptimal in terms of rate-distortion tradeoff, particularly ...

Read Now

Latent Diffusion Bridges for Unsupervised Musical Audio Timbre Transfer

ICASSP, 2025 | Michele Mancusi, Yurii Halychanskyi, Kin Wai Cheuk, Eloi Moliner, Chieh-Hsin Lai, Stefan Uhlich*, Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Giorgio Fabbro*, Yuki Mitsufuji

Music timbre transfer is a challenging task that involves modifying the timbral characteristics of an audio signal while preserving its melodic structure. In this paper, we propose a novel method based on dual diffusion bridges, trained using the CocoChorales Dataset, which ...

Read Now

Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects

ICASSP, 2023 | Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Stefan Uhlich*, Kyogu Lee*, Yuki Mitsufuji

We propose an end-to-end music mixing style transfer system that converts the mixing style of an input multitrack to that of a reference song. This is achieved with an encoder pre-trained with a contrastive objective to extract only audio effects related information from a r...

Read Now