Enhancing Whisper's Accuracy and Speed for Indian Languages through Prompt-Tuning and Tokenization
Pankaj Wasnik
Kumud Tripathi
Raj Gothi
ICASSP-25
2025
Abstract
Automatic speech recognition has recently seen a significant advancement with large foundational models such as Whisper. However, these models often struggle to perform well in low-resource languages, such as Indian languages. This paper explores two novel approaches to enhance Whisper's multilingual speech recognition performance in Indian languages. First, we propose prompt-tuning with language family information, which enhances Whisper's accuracy in linguistically similar languages. Second, we introduce a novel tokenizer that reduces the number of generated tokens, thereby accelerating Whisper's inference speed. Our extensive experiments demonstrate that the tokenizer significantly reduces inference time, while prompt-tuning enhances accuracy across various Whisper model sizes, including Small, Medium, and Large. Together, these techniques achieve a balance between optimal WER and inference speed.
Related Publications
Real-world object detection systems, such as those in autonomous driving and surveillance, must continuously learn new object categories and simultaneously adapt to changing environmental conditions. Existing approaches, Class Incremental Object Detection (CIOD) and Domain I…
Precise Event Spotting (PES) aims to identify events and their class from long, untrimmed videos, particularly in sports. The main objective of PES is to detect the event at the exact moment it occurs. Existing methods mainly rely on features from a large pre-trained network…
Ensembling neural machine translation (NMT) models to produce higher-quality translations than the $L$ individual models has been extensively studied. Recent methods typically employ a candidate selection block (CSB) and an encoder-decoder fusion block (FB), requiring infere…
JOIN US
Shape the Future of AI with Sony AI
We want to hear from those of you who have a strong desire
to shape the future of AI.