Authors

* External authors

Venue

Date

Share

Open-Set Object Detection By Aligning Known Class Representations

Vishal Chudasama

Naoyuki Onoe*

Pankaj Wasnik

Hiran Sarkar

Vineeth N Balasubramanian

* External authors

WACV-24

2025

Abstract

Open-Set Object Detection (OSOD) has emerged as a contemporary research direction to address the detection of unknown objects. Recently, few works have achieved remarkable performance in the OSOD task by employing contrastive clustering to separate unknown classes. In contrast, we propose a new semantic clustering-based approach to facilitate a meaningful alignment of clusters in semantic space and introduce a class decorrelation module to enhance inter-cluster separation. Our approach further incorporates an object focus module to predict objectness scores, which enhances the detection of unknown objects. Further, we employ i) an evaluation technique that penalizes lowconfidence outputs to mitigate the risk of misclassification of the unknown objects and ii) a new metric called HMP that combines known and unknown precision using harmonic mean. Our extensive experiments demonstrate that the proposed model achieves significant improvement on the MS-COCO & PASCAL VOC dataset for the OSOD task.

Related Publications

REWIND: Speech Time Reversal for Enhancing Speaker Representations in Diffusion-based Voice Conversion

Interspeech, 2026
Ishan Biyani, Nirmesh Shah*, Ashishkumar Gudmalwar, Pankaj Wasnik, Rajiv Ratn Shah

Speech time reversal refers to the process of reversing the entire speech signal in time, causing it to play backward. Such signals are completely unintelligible since the fundamental structures of phonemes and syllables are destroyed. However, they still retain tonal patter…

Listen Like a Teacher: Mitigating Whisper Hallucinations using Adaptive Layer Attention and Knowledge Distillation

AAAI, 2025
Kumud Tripathi, Aditya Srinivas Menon, Aman Gupta, Raj Prakash Gohil, Pankaj Wasnik

The Whisper model, an open-source automatic speech recognition system, is widely adopted for its strong performance across multilingual and zero-shot settings. However, it frequently suffers from hallucination errors, especially under noisy acoustic conditions. Previous work…

In-Domain African Languages Translation Using LLMs and Multi-armed Bandits

ACL, 2025
Pratik Rakesh Singh, Kritarth Prasad, Mohammadi Zaki, Pankaj Wasnik

Neural Machine Translation (NMT) systems face significant challenges when working with low-resource languages, particularly in domain adaptation tasks. These difficulties arise due to limited training data and suboptimal model generalization, As a result, selecting an opti- …

  • HOME
  • Publications
  • Open-Set Object Detection By Aligning Known Class Representations

JOIN US

Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.