Iteratively Improving Speech Recognition and Voice Conversion

Interspeech, 2023
Mayank Kumar Singh*, Naoya Takahashi, Onoe Naoyuki*

Many existing works on voice conversion (VC) tasks use automatic speech recognition (ASR) models for ensuring linguistic consistency between source and converted samples. However, for the low-data resource domains, training a high-quality ASR remains to be a challenging task…

The Whole Is Greater than the Sum of Its Parts: Improving DNN-based Music Source Separation

Ryosuke Sawata*, Naoya Takahashi, Stefan Uhlich*, Shusuke Takahashi*, Yuki Mitsufuji

This paper presents the crossing scheme (X-scheme) for improving the performance of deep neural network (DNN)-based music source separation (MSS) without increasing calculation cost. It consists of three components: (i) multi-domain loss (MDL), (ii) bridging operation, which…

Nonparallel Emotional Voice Conversion for unseen speaker-emotion pairs using dual domain adversarial network Virtual Domain …

ICASSP, 2023
Nirmesh Shah*, Mayank Kumar Singh*, Naoya Takahashi, Naoyuki Onoe*

Primary goal of an emotional voice conversion (EVC) system is to convert the emotion of a given speech signal from one style to another style without modifying the linguistic content of the signal. Most of the state-of-the-art approaches convert emotions for seen speaker-emo…


Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.