New Research at ACL 2025 Tackles Real-World Translation Challenges

Events

Sony AI

July 29, 2025

Introduction

Languages are more than words. Language is tied to memory, culture, and identity. And nowhere is this more evident than in the challenges of machine translation. At ACL 2025, researchers from Sony AI tackle this problem head-on with two novel approaches that aim to improve how language models handle nuanced, underrepresented languages.

In the first paper, the team introduces IdiomCE, a graph neural network (GNN)-based system designed to improve idiomatic translation across Indian languages—where a phrase like “bury the hatchet” must do more than survive translation; it must make cultural sense in Hindi, Tamil, Telugu, or Bengali.

Rather than relying on word-by-word translation, the model maps idioms across cultures, using shared context and historical significance to create natural, figurative translations.

In the second paper, the researchers shift focus to African languages and specialized translation domains like news, religion, and film. Here, they use multi-armed bandit algorithms—a method rooted in reinforcement learning—to dynamically select the best translation model on the fly, especially when data is scarce and urgency is high. Whether you're interpreting a health request in Swahili or subtitling a Yoruba film, this approach ensures the system adapts intelligently without needing retraining.

Together, these papers address a growing global concern: how to build translation systems that aren’t just technically sound, but culturally fluent and contextually adaptive.

Beating the Odds: How Smart Algorithms Are Powering Better African Language Translations

Introduction

Imagine you're building a translation system for a multilingual chatbot used across Africa. One moment, it's answering health-related questions in Swahili. The next, it's translating news headlines in Igbo or entertainment content in Yoruba.

Each domain requires a different tone, vocabulary, and translation style—and choosing the wrong model can lead to awkward or inaccurate results.

Now imagine the software misfires because it picked the wrong model for that domain.

That’s the problem researchers from Sony AI are trying to solve in their paper, In-Domain African Languages Translation Using LLMs and Multi-armed Bandits (ACL 2025). The team explores how to choose the best machine translation model when you:

- Have very little data,
- Are working in specialized domains like healthcare, religious text, news, or entertainment,
- and need the best output.

Their solution? Let a smart algorithm do the picking.

What Is This Research About?

Machine translation models are a lot like chefs in a kitchen. Each one is trained differently. Some are good at spicy food (news), others excel at comfort food (healthcare). If you randomly assign them to cook everything, some meals will fall flat.

This research treats the act of choosing the right “chef” (translation model) as a multi-armed bandit problem, a concept from reinforcement learning. Think of that row of chefs aforementioned, each one representing a different translation model. You don’t know which will give you the best payout (i.e., the best translation), but you learn by selecting one at a time.

“We take a more principled way to dynamically choose the machine translation model on-the-fly by treating the model selection process as a multi-armed bandit problem.”

They tested this idea using four bandit algorithms:

- UCB (Upper Confidence Bound): picks the model with the best balance of performance and uncertainty.
- Thompson Sampling: like playing poker with probability; it samples possible outcomes and picks the most promising model.
- Linear UCB: assumes performance is a linear combination of features about the input.
- Neural LinUCB: a smarter version of Linear UCB that uses a neural net to learn better patterns.

Each sentence to be translated is first encoded using a multilingual sentence embedding model called LaBSE, which helps describe the sentence’s features, just like chefs might use a recipe card to decide if a dish is more savory or sweet. “Each source sentence x is passed through a Language-agnostic BERT Sentence Encoder (LaBSE)... This vector x acts as the context vector in the contextual bandit algorithms.”

Why It Matters

Translation isn’t one-size-fits-all. What works in a news article might sound off in a religious sermon or a romantic movie scene.

Traditionally, you’d “fine-tune” a model with domain-specific examples. But for African languages like Yoruba, Igbo, or Swahili, those examples are scarce. That’s like asking a chef to specialize in Thai cuisine without ever having tasted lemongrass.

This paper sidesteps the problem: instead of fine-tuning, just let the system pick the right model for each domain, even with very little data.

“Our method effectively balances exploration and exploitation... enabling optimal model selection with high confidence,” the researchers explain. Even better, this approach works with or without reference translations (i.e., “gold standard” answers).

“Our method can effectively be applied to model selection in target-free domain translation... where reference translations are not available,” the authors note. In other words: even if you can’t see the answer key, the system learns which model is doing the job well—by watching patterns.

Examples from the Paper

Here are some clear results from the paper that show how this approach pays off:

Example 1: News Articles, Igbo

In the News domain for Igbo, the bandit algorithm UCB beat the best known model (NLLB) by a small but measurable margin:

- NLLB BLEU score: 19.73
- UCB BLEU score: 19.83

“UCB in the News domain for Igbo... surpass[es] the performance of the best possible NMT model.” This may sound small, but in translation, every fraction counts—especially when you’re working with so little data.

Example 2: Religious Texts, Swahili

Religious language is especially nuanced. Translating concepts like “grace” or “devotion” isn’t always straightforward. The Thompson Sampling algorithm picked a better model than the best baseline here too:

- NLLB BLEU score: 28.01
- Thompson Sampling BLEU score: 32.54

“Thompson Sampling (TS) in... Religious, and Swahili... result[s] in slight but notable improvements in translation quality.” That’s a gain of over 4 BLEU points, without any additional training.

Example 3: Movie Dialogue, Yoruba (Target-Free)

What if you don’t even have “correct” translations to compare to? This is common in under-resourced languages. The researchers used CometKiwi, a metric that evaluates quality without a reference translation.

UCB and LinUCB still managed to pick high-performing models, showing that the method works even when you're flying blind. “In the absence of target translations... the Bandit-based approaches successfully identify the best arms... underscoring the flexibility of our approach.”

Results at a Glance

Across 3 languages (Igbo, Yoruba, Swahili) and 3 domains (News, Movies, Religion), the bandit methods—especially UCB—were consistently strong:

“UCB... outperforms the best model, NLLB, by an average improvement of 2.68% in BLEU score.”

The method is also lightweight:

- No fine-tuning required
- No need for large datasets
- No full retraining for each language pair

It's like adding a smart assistant who learns which translator is best for each topic—without slowing anything down.

Final Thoughts

This paper offers an elegant solution to a very real problem: how to make machine translation smarter when working with the languages that need it most. Instead of building new models for every situation, just learn which model is best to use, and when.

“Our findings highlight the potential of bandit-based methods to improve NMT performance in resource-constrained environments.”

As machine learning systems aim to be more inclusive, solutions like this are crucial for reaching speakers of the world’s underrepresented languages, without asking them to wait for years of data collection first.

Translating More Than Words: How Graph Neural Networks Help Idioms Speak Across Cultures

Introduction

Idioms like “spill the beans” or “hit the sack” can paint vivid pictures, but only if you understand the cultural canvas they come from. Translating them into other languages, especially across culturally diverse regions like India, is notoriously difficult. “Idioms have key properties such as noncompositionality, fixedness, and cultural specificity... making them unique to specific languages or regions,” the team explains.

A literal translation often fails to convey the emotion, intent, or nuance behind the expression. Enter IdiomCE, a research effort from Sony AI that proposes a smarter, culturally aware way to handle idiomatic translation across Indian languages.

What Is This Research?

The paper introduces a system that uses Graph Neural Networks (GNNs) and cultural metadata to improve idiomatic translation from English to multiple Indian languages. “We propose IdiomCE, an adaptive graph neural network (GNN) based methodology that learns intricate mappings between idiomatic expressions… facilitating improved idiomatic translation in smaller models.”

Meaning: Rather than translating idioms word-for-word, the model builds a knowledge graph that links idioms based on shared concepts, values, and historical or situational contexts. This allows the system to understand meaning at a deeper level and even handle idioms it hasn’t seen before.

The model also supports “pivot-based translation”, using English as a bridge to translate between Indian languages like Hindi and Tamil, without needing a direct model for every pair. “Using English as a pivot language, we extend our approach to facilitate idiomatic translation across Indic languages without needing to train GNN models between all possible pairs of languages,” the researchers explain.

Why It Matters

This kind of translation matters for more than just accuracy: it’s about communication, empathy, and connection. Whether you're building a chatbot, a customer service platform, or educational software, conveying meaning that feels natural to users in their native language is essential.

“[Idioms] often originate from diverse cultural, historical, and situational contexts… making them integral to everyday language.”

In real-world settings, such as chatbots for government services, healthcare apps, or educational platforms, a model that captures idiomatic meaning, rather than just literal words, can prevent confusion and even enhance trust. Traditional models fail in this regard. As the researchers note, “They still fail to overcome key challenges… overlook cultural factors… and fail to address the one-to-many nature of idioms.”

Examples from the Paper

1. “Bury the hatchet.”

This idiom typically means to make peace or resolve a conflict. A literal translation might offer something like “सौदा पटाना” (make a deal), which misses the emotional nuance.

IdiomCE Translation: “गिले-शिकवे मिटाना” — a Hindi idiom meaning to forget grievances.

This pairing is shown in the paper as part of the graph construction between English and Hindi idioms:
“Bury the hatchet → गिले-शिकवे मिटाना” (Figure 2, Section 3.2)

This translation preserves both intent and cultural relevance, aligning with the goal of the system to “generate translations that are both contextually and culturally relevant.”

2. “It’s all Greek to me.”

Used when someone doesn’t understand what they’re reading or hearing, this idiom often trips up translation models.

IdiomCE Translation: “मेरे सर के ऊपर से गया” — literally “it went over my head.”
This phrase is more natural and widely used in Hindi to convey confusion or lack of understanding.

“Well, it's all Greek to me...”
→ “मेरे सर के ऊपर से गया...”
(Appendix B.1: Translation Example en-hi direction)

The translation shows that IdiomCE is capable of selecting idioms that resonate with native speakers while preserving figurative meaning.

Results

The authors tested IdiomCE on both seen and unseen idioms across four Indian languages (Hindi, Tamil, Bengali, Telugu), and found that it consistently outperformed direct prompting and previous benchmarks like IdiomKB.

“On average, IdiomCE improves LLM-eval scores by 18.51% for en-hi, 14.71% for en-bn, 6.45% for en-ta, and 10.33% for en-te,” the authors note.

Even smaller models benefited from the approach: “With IdiomCE, very small models like Llama 3.2 3B perform comparably to the directly prompted larger Llama 3.1 8B variant.”

And in human evaluations conducted with 19 native speakers: “IdiomCE consistently outperformed the other baselines... The performance gap was especially pronounced in the en-hi and en-bn directions.”

With India’s linguistic diversity, deploying IdiomCE in tools like, for example, Google Translate, WhatsApp’s in-chat translators, or YouTube’s subtitle generators could drastically improve accuracy and user trust and ensure brands and services are culturally sensitive. It’s also valuable for language learners, content creators, and educators aiming to preserve cultural context while reaching multilingual audiences.

Conclusion

The work presented by Sony AI at ACL 2025 highlights a critical shift in machine translation research—from building one-size-fits-all systems to developing adaptive, culturally aware methods that prioritize real-world usability.

In IdiomCE, we see the value of preserving figurative language and cultural resonance through graph-based learning. In the multi-armed bandit approach, we witness the practical power of letting algorithms choose the right model for the moment—especially in low-resource, high-stakes scenarios.

At their core, both papers propose translation strategies that listen before they speak—recognizing the deep, contextual layers behind how people communicate. These solutions point toward a future where machine translation not only understands what’s being said, but how, why, and to whom. In a multilingual world, that kind of understanding isn’t just helpful: it’s essential.

Dive into both research papers below:

Graph-Assisted Culturally Adaptable Idiomatic Translation for Indic Languages

In-Domain African Languages Translation Using LLMs and Multi-armed Bandits

Latest Blog

October 1, 2025 | Sony AI

Advancing AI: Highlights from September

At Sony AI, each month is a chance to share how our research, collaborations, and stories are shaping the field of artificial intelligence. September brought together music, vision…

September 29, 2025 | Sony AI, Events

From Editing to Mastering: AI Research Insights at ISMIR 2025

At ISMIR 2025 in Daejeon, South Korea, Sony AI and its collaborators presented four new research projects that explore how AI can support music creators and producers. From editing…

September 9, 2025 | Sony AI

Advancing Analog Design with AI: Sony AI’s Contributions at MLCAD 2025

Analog circuit design has long resisted the kind of automation that has transformed digital design. They may be small, yet mighty, but they are notoriously difficult to automate. W…

SEE ALL

HOME
Blog
New Research at ACL 2025 Tackles Real-World Translation Challenges

JOIN US

Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.

LEARN MORE