Sony AI Publications | Research Papers & Scientific Findings

Browse by venue

Browse by date

Browse by author

Search Publications

ARXIV, 2026

Woosh: A Sound Effects Foundation Model

Gaëtan Hadjeres, Marc Ferras, Khaled Koutini, Benno Weck, Alexandre Bittar, Thomas Hummel, Zineb Lahrici, Hakim Missoum, Joan Serrà, Yuki Mitsufuji

Woosh is Sony AI's open sound effects foundation model featuring high-quality audio encoding, text-to-audio, and video-to-audio generation. Optimized for sound effects, it offers competitive performance against models like StableAudio-Open and TangoFlux, with distilled models for fast, low-resource inference.

CHI 2026, 2026

Emergent, not Immanent: A Baradian Reading of Explainable AI

Fabio Morreale, Joan Serrà, Yuki Mitsufuji

This paper challenges conventional assumptions in Explainable AI (XAI) by applying Barad's agential realism, arguing that AI interpretations are not fixed within models but emerge from dynamic entanglements of humans, context, and technology. It critiques existing XAI methods and proposes ethical design directions for interfaces that support emergent interpretation, illustrated through a speculative text-to-music case study.

IEEE, 2026

Diffusion-based Signal Refiner for Speech Enhancement and Separation

Ryosuke Sawata, Masato Hirano*, Naoki Murata, Shusuke Takahashi*, Yuki Mitsufuji

Although recent speech processing technologies have achieved significant improvements in objective metrics, there still remains a gap in human perceptual quality. This paper proposes Diffiner, a novel solution that utilizes the powerful generative capability of diffusion mod...

ICASSP, 2026

S-PRESSO: Ultra Low Bitrate Sound Effect Compression With Diffusion Autoencoders And Offline Quantization

Zineb Lahrichi, Gaëtan Hadjeres, Gaël Richard, Geoffroy Peeters

Neural audio compression models have recently achieved extreme compression rates, enabling efficient latent generative modeling. Conversely, latent generative models have been applied to compression, pushing the limits of continuous and discrete approaches. However, existing...

CVPR, 2026

PAVAS: Physics-Aware Video-to-Audio Synthesis

Oh Hyun-Bin*, Yuhta Takida, Toshimitsu Uesaka, Tae-Hyun Oh*, Yuki Mitsufuji

Recent advances in Video-to-Audio (V2A) generation have achieved impressive perceptual quality and temporal synchronization, yet most models remain appearance-driven, capturing visual-acoustic correlations without considering the physical factors that shape real-world sounds...

CVPR, 2026

MeanFlow Transformers with Representation Autoencoders

Zheyuan Hu*, Chieh-Hsin Lai, Ge Wu*, Yuki Mitsufuji, Stefano Ermon*

MeanFlow (MF) is a diffusion-motivated generative model that enables efficient few-step generation by learning long jumps directly from noise to data. In practice, it is often used as a latent MF by leveraging the pre-trained Stable Diffusion variational autoencoder (SD-VAE)...

INTERSPEECH, 2026

REWIND: Speech Time Reversal for Enhancing Speaker Representations in Diffusion-based Voice Conversion

Ishan Biyani, Nirmesh Shah*, Ashishkumar Gudmalwar, Pankaj Wasnik, Rajiv Ratn Shah

Speech time reversal refers to the process of reversing the entire speech signal in time, causing it to play backward. Such signals are completely unintelligible since the fundamental structures of phonemes and syllables are destroyed. However, they still retain tonal patter...

ICLR, 2026

GenDataAgent: On-the-fly Dataset Augmentation with Synthetic Data

Zhiteng Li, Lele Chen, Jerone Andrews, Yunhao Ba, Yulun Zhang, Alice Xiang

We propose a generative agent that augments training datasets with synthetic datafor model fine-tuning. Unlike prior work, which uniformly samples synthetic data,our agent iteratively generates relevant samples on-the-fly, aligning with the targetdistribution. It prioritizes...

ICLR, 2026

From Neural Networks to Logical Theories: The Correspondence between Fibring Modal Logics and Fibring Neural Networks

Ouns El Harzli, Bernardo Cuenca Grau*, Artur d'Avila Garcez*, Ian Horrocks, Tarek R Besold

Fibring of modal logics is a well-established formalism for combining countable families of modal logics into a single fibred language with common semantics, characterized by fibred models. Inspired by this formalism, fibring of neural networks was introduced as a neurosymbo...

ICLR, 2026

Theory-Informed Improvements to Classifier-Free Guidance for Discrete Diffusion Models

Kevin Rojas*, Ye He*, Chieh-Hsin Lai, Yuhta Takida, Yuki Mitsufuji, Molei Tao*

Classifier-Free Guidance (CFG) is a widely used technique for conditional generation and improving sample quality in continuous diffusion models, and recent works have extended it to discrete diffusion. This paper theoretically analyzes CFG in the context of masked discrete ...

ICLR, 2026

3D Scene Prompting for Scene-Consistent Camera-Controllable Video Generation

JoungBin Lee*, Jaewoo Jung*, Jisang Han*, Takuya Narihira, Kazumi Fukuda, Junyoung Seo*, Sunghwan Hong*, Yuki Mitsufuji, Seungryong Kim*

We present 3DScenePrompt, a framework that generates the next video chunk from arbitrary-length input while enabling precise camera control and preserving scene consistency. Unlike methods conditioned on a single image or a short clip, we employ dual spatio-temporal conditio...

ICLR, 2026

LLM2Fx-Tools: Tool Calling For Music Post-Production

Seungheon Doh*, Junghyun Koo, Marco A. Martínez-Ramírez, Woosung Choi, Wei-Hsiang Liao, Qiyu Wu*, Juhan Nam*, Yuki Mitsufuji

This paper introduces LLM2Fx-Tools, a multimodal tool-calling framework that generates executable sequences of audio effects (Fx-chain) for music post-production. LLM2Fx-Tools uses a large language model (LLM) to understand audio inputs, select audio effects types, determine...

ICLR, 2026

SONA: Learning Conditional, Unconditional, and Mismatching-Aware Discriminator

Yuhta Takida, Satoshi Hayakawa*, Takashi Shibuya, Masaaki Imaizumi*, Naoki Murata, Bac Nguyen, Toshimitsu Uesaka, Chieh-Hsin Lai, Yuki Mitsufuji

Deep generative models have made significant advances in generating complex content, yet conditional generation remains a fundamental challenge. Existing conditional generative adversarial networks often struggle to balance the dual objectives of assessing authenticity and c...

ICLR, 2026

Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment

Bac Nguyen, Yuhta Takida, Naoki Murata, Chieh-Hsin Lai, Toshimitsu Uesaka, Stefano Ermon*, Yuki Mitsufuji

Slot Attention (SA) with pretrained diffusion models has recently shown promise for object-centric learning (OCL), but suffers from slot entanglement and weak alignment between object slots and image content. We propose Contrastive Object-centric Diffusion Alignment (CODA), ...

ICLR, 2026

Concept-TRAK: Understanding How Diffusion Models Learn Concepts through Concept-Level Attribution

Yonghyun Park*, Chieh-Hsin Lai, Satoshi Hayakawa*, Yuhta Takida, Naoki Murata, Wei-Hsiang Liao, Woosung Choi, Kin Wai Cheuk, Junghyun Koo, Yuki Mitsufuji

While diffusion models excel at image generation, their growing adoption raises critical concerns around copyright issues and model transparency. Existing attribution methods identify training examples influencing an entire image, but fall short in isolating contributions to...

ICLR, 2026

CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models

Zheyuan Hu*, Chieh-Hsin Lai, Yuki Mitsufuji, Stefano Ermon*

Flow map models such as Consistency Models (CM) and Mean Flow (MF) enable few-step generation by learning the long jump of the ODE solution of diffusion models, yet training remains unstable, sensitive to hyperparameters, and costly. Initializing from a pre-trained diffusion...

ICLR, 2026

VIRTUE: Visual-Interactive Text-Image Universal Embedder

Wei-Yao Wang*, Kazuya Tateishi*, Qiyu Wu*, Shusuke Takahashi*, Yuki Mitsufuji

Multimodal representation learning models have demonstrated successful operation across complex tasks, and the integration of vision-language models (VLMs) has further enabled embedding models with instruction-following capabilities. However, existing embedding models lack v...

ICLR, 2026

Tracing the Principles Behind Modern Diffusion Models

Chieh-Hsin Lai, Yang Song*, Dongjun Kim*, Yuki Mitsufuji, Stefano Ermon*

Diffusion models can feel like a jungle of acronyms, but the core idea is simple: start from noise and gradually move a cloud of samples until it looks like real data. This post gives an intuition-first tour showing that DDPMs, score-based models, and flow matching are the s...

ICASSP, 2026

FoleyBench: A Benchmark For Video-to-Audio Models

Satvik Dixit, Koichi Saito, Zhi Zhong*, Yuki Mitsufuji, Chris Donahue

Video-to-audio generation (V2A) is of increasing importance in domains such as film post-production, AR/VR, and sound design, particularly for the creation of Foley sound effects synchronized with on-screen actions. Foley requires generating audio that is both semantically a...

ICASSP, 2026

Automatic Music Mixing Using a Generative Model of Effect Embeddings

Eloi Moliner, Marco A. Martínez-Ramírez, Junghyun Koo, Wei-Hsiang Liao, Kin Wai Cheuk, Joan Serrà, Vesa Välimäki*, Yuki Mitsufuji

Music mixing involves combining individual tracks into a cohesive mixture, a task characterized by subjectivity where multiple valid solutions exist for the same input. Existing automatic mixing systems treat this task as a deterministic regression problem, thus ignoring thi...

ICASSP, 2026

Automatic Music Sample Identification with Multi-Track Contrastive Learning

Alain Riou, Joan Serrà, Yuki Mitsufuji

Sampling, the technique of reusing pieces of existing audio tracks to create new music content, is a very common practice in modern music production. In this paper, we tackle the challenging task of automatic sample identification, that is, detecting such sampled content and...

ICASSP, 2026

Leveraging Whisper Embeddings for Audio-based Lyrics Matching

Eleonora Mancini*, Joan Serrà, Paolo Torroni*

Audio-based lyrics matching can be an appealing alternative to other content-based retrieval approaches, but existing methods often suffer from limited reproducibility and inconsistent baselines. In this work, we introduce WEALY, a fully reproducible pipeline that leverages ...

ICASSP, 2026

MMAudioSep: Taming Video-to-Audio Generative Model Towards Video/Text-Queried Sound Separation

Akira Takahashi*, Shusuke Takahashi*, Yuki Mitsufuji

We introduce MMAudioSep, a generative model for video/text-queried sound separation that is founded on a pretrained video-to-audio model. By leveraging knowledge about the relationship between video/text and audio learned through a pretrained audio generative model, we can t...

ICASSP, 2026

Towards Blind Data Cleaning: A Case Study in Music Source Separation

Azalea Gui, Woosung Choi, Junghyun Koo, Kazuki Shimada, Takashi Shibuya, Joan Serrà, Wei-Hsiang Liao, Yuki Mitsufuji

The performance of deep learning models for music source separation heavily depends on training data quality. However, datasets are often corrupted by difficult-to-detect artifacts such as audio bleeding and label noise. Since the type and extent of contamination are typical...

ICASSP, 2026

SAVGBench: Benchmarking Spatially Aligned Audio-Video Generation

Kazuki Shimada, Christian Simon, Takashi Shibuya, Shusuke Takahashi*, Yuki Mitsufuji

This work addresses the lack of multimodal generative models capable of producing high-quality videos with spatially aligned audio. While recent advancements in generative models have been successful in video generation, they often overlook the spatial alignment between audi...

ICASSP, 2026

Do Foundational Audio Encoders Understand Music Structure?

Keisuke Toyama*, Zhi Zhong*, Akira Takahashi*, Shusuke Takahashi*, Yuki Mitsufuji

In music information retrieval (MIR) research, the use of pretrained foundational audio encoders (FAEs) has recently become a trend. FAEs pretrained on large amounts of music and audio data have been shown to improve performance on MIR tasks such as music tagging and automat...

CHI, 2025

EyeO: Autocalibrating Gaze Output with Gaze Input for Gaze Typing

Akanksha Saran, Jacob Alber*, Cyril Zhang*, Ann Paradiso*, Danielle Bragg*, John Langford*

Gaze tracking devices have the potential to expand interactivity greatly, yet miscalibration remains a significant barrier to use. As devices miscalibrate, people tend to compensate by intentionally offsetting their gaze, which makes detecting miscalibration from eye signals...

THRI, 2025

Human-Interactive Robot Learning: Definition, Challenges, and Recommendations

Kim Baraka*, Ifrah Idrees*, Taylor Kessler Faulkner*, Erdem Biyik*, Serena Booth*, Mohamed Chetouani*, Daniel Grollman*, Akanksha Saran, Emmanuel Senft*, Silvia Tulli*, Anna-Lisa Vollmer*, Antonio Andriella*, Helen Beierling*, Tiffany Horter*, Jens Kober*, Isaac Sheidlower*, Matthew Taylor*, Sanne van Waveren*, Xuesu Xiao*

Robot learning from humans has been proposed and researched for several decades as a means to enable robots to learn new skills or adapt existing ones to new situations. Recent advances in artificial intelligence, including learning approaches like reinforcement learning and...

AAAI, 2025

Listen Like a Teacher: Mitigating Whisper Hallucinations using Adaptive Layer Attention and Knowledge Distillation

Kumud Tripathi, Aditya Srinivas Menon, Aman Gupta, Raj Prakash Gohil, Pankaj Wasnik

The Whisper model, an open-source automatic speech recognition system, is widely adopted for its strong performance across multilingual and zero-shot settings. However, it frequently suffers from hallucination errors, especially under noisy acoustic conditions. Previous work...

WIRES, 2025

XAI-Guided Continual Learning: Rationale, Methods, and Future Directions

Michela Proietti*, Alessio Ragno*, Roberto Capobianco

Providing neural networks with the ability to learn new tasks sequentially represents one of the main challenges in artificial intelligence. Unlike humans, neural networks are prone to losing previously acquired knowledge upon learning new information, a phenomenon known as ...

WSDM, 2025

Interpretable Memory-based Prototypical Pooling

Alessio Ragno*, Roberto Capobianco

Graph Neural Networks (GNNs) have proven their effectiveness in various graph-structured data applications. However, one of the significant challenges in the realm of GNNs is representation learning, a critical concept that bridges graph pooling, aimed at creating compressed...

CCN, 2025

Intermediate Layers of LLMs Align Best With the Brain by Balancing Short- and Long-Range Information

Michela Proietti*, Roberto Capobianco, Mariya Toneva

Contextual integration is fundamental to human language comprehension. Language models are a powerful tool for studying how contextual information influences brain activity. In this work, we analyze the brain alignment of three types of language models, which vary in how the...

RLC, 2025

ProtoCRL: Prototype-based Network for Continual Reinforcement Learning

Michela Proietti*, Peter R. Wurman, Peter Stone, Roberto Capobianco

The purpose of continual reinforcement learning is to train an agent on a sequence of tasks such that it learns the ones that appear later in the sequence while retaining theability to perform the tasks that appeared earlier. Experience replay is a popular method used to mak...

NEURIPS, 2025

Automated Reward Design for Gran Turismo

Michel Ma, Takuma Seno, Kaushik Subramanian, Peter R. Wurman, Peter Stone, Craig Sherstan

When designing reinforcement learning (RL) agents, a designer communicates the desired agent behavior through the definition of reward functions - numerical feedback given to the agent as reward or punishment for its actions. However, mapping desired behaviors to reward func...

AAAI, 2025

Vid-CamEdit: Video Camera Trajectory Editing with Generative Rendering from Estimated Geometry

Junyoung Seo*, Jisang Han*, Jaewoo Jung*, Siyoon Jin, JoungBin Lee*, Takuya Narihira, Kazumi Fukuda, Takashi Shibuya, Donghoon Ahn, Shoukang Hu, Seungryong Kim*, Yuki Mitsufuji

We introduce Vid-CamEdit, a novel framework for video camera trajectory editing, enabling the re-synthesis of monocular videos along user-defined camera paths. This task is challenging due to its ill-posed nature and the limited multi-view video data for training. Traditiona...

AAAI, 2025

SteerMusic: Enhanced Musical Consistency for Zero-shot Text-Guided and Personalized Music Editing

Xinlei Niu, Kin Wai Cheuk, Jing Zhang, Naoki Murata, Chieh-Hsin Lai, Michele Mancusi, Woosung Choi, Giorgio Fabbro*, Wei-Hsiang Liao, Charles Patrick Martin, Yuki Mitsufuji

Music editing is an important step in music production, which has broad applications, including game development and film production. Most existing zero-shot text-guided methods rely on pretrained diffusion models by involving forward-backward diffusion processes for editing...

CSCW, 2025

Responsibly Training Foundation Models: Actualizing Ethical Principles for Curating Large-Scale Training Datasets in the Era of Massive AI Models

Morgan Klaus Scheuerman, Dora Zhao*, Jerone T. A. Andrews, Abeba Birhane, Q. Vera Liao*, Georgia Panagiotidou*, Pooja Chitre*, Kathleen Pine, Shawn Walker*, Jieyu Zhao*, Alice Xiang

AI technologies have become ubiquitous, influencing domains from healthcare to finance and permeating our daily lives. Concerns about the values underlying the creation and use of datasets to develop AI technologies are growing. Current dataset practices often disregard crit...

CSCW, 2025

How Data Workers Shape Datasets: The Role of Positionality in Data Collection and Annotation for Computer Vision

Morgan Klaus Scheuerman, Allison Woodruff, Jed R. Brubaker

Data workers play a key role in the big data industry. Clients hire data workers to collect and annotate data with human identity concepts, like demographic categories or clothing items. Often, such workers are treated as computational—they are expected to quickly and object...

ICCV, 2025

Learning Hierarchical Line Buffer for Image Processing

Jiacheng Li, Feiran Li, Daisuke Iso

In recent years, neural networks have achieved significant progress in offline image processing. However, in online scenarios, particularly in on-chip implementations, memory usage emerges as a critical bottleneck due to the limited memory resources of integrated image proce...

NEURIPS, 2025

Music Arena: Live Evaluation for Text-to-Music

Yonghyun Kim, Wayne Chi, Anastasios N. Angelopoulos, Wei-Lin Chiang, Koichi Saito, Shinji Watanabe, Yuki Mitsufuji, Chris Donahue

We present Music Arena, an open platform for scalable human preference evaluation of text-to-music (TTM) models. Soliciting human preferences via listening studies is the gold standard for evaluation in TTM, but these studies are expensive to conduct and difficult to compare...

NEURIPS, 2025

Large-Scale Training Data Attribution for Music Generative Models via Unlearning

Woosung Choi, Junghyun Koo, Kin Wai Cheuk, Joan Serrà, Marco A. Martínez-Ramírez, Yukara Ikemiya, Naoki Murata, Yuhta Takida, Wei-Hsiang Liao, Yuki Mitsufuji

This paper explores the use of unlearning methods for training data attribution (TDA) in music generative models trained on large-scale datasets. TDA aims to identify which specific training data points contributed to the generation of a particular output from a specific mod...

NEURIPS, 2025

Blind Inverse Problem Solving Made Easy by Text-to-Image Latent Diffusion

Michail Dontas, Yutong He, Naoki Murata, Yuki Mitsufuji, J. Zico Kolter*, Ruslan Salakhutdinov*

Blind inverse problems, where both the target data and forward operator are unknown, are crucial to many computer vision applications. Existing methods often depend on restrictive assumptions such as additional training, operator linearity, or narrow image distributions, thu...

ISMIR, 2025

Enhancing neural audio fingerprint robustness to audio degradation for music identification

R. Oguz Araz, Guillem Cortès-Sebastià, Emilio Molina, Joan Serrà, Xavier Serra, Yuki Mitsufuji, Dmitry Bogdanov

Audio fingerprinting (AFP) allows the identification of unknown audio content by extracting compact representations, termed audio fingerprints, that are designed to remain robust against common audio degradations. Neural AFP methods often employ metric learning, where repres...

TMLR, 2025

Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation

Yutong He, Alexander Robey, Naoki Murata, Yiding Jiang, Joshua Williams, George J. Pappas, Hamed Hassani, Yuki Mitsufuji, Ruslan Salakhutdinov*, J. Zico Kolter*

Prompt engineering is an effective but labor-intensive way to control text-to-image (T2I) generative models. Its time-intensive nature and complexity have spurred the development of algorithms for automated prompt generation. However, these methods often struggle with transf...

TMLR, 2025

GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models

Muhammad Jehanzeb Mirza, Mengjie Zhao*, Zhuoyuan Mao, Sivan Doveh, Wei Lin, Paul Gavrikov, Michael Dorkenwald, Shiqi Yang*, Saurav Jha, Hiromi Wakaki*, Yuki Mitsufuji

In this work, we propose GLOV, which enables Large Language Models (LLMs) to act as implicit optimizers for Vision-Language Models (VLMs) to enhance downstream vision tasks. GLOV prompts an LLM with the downstream task description, querying it for suitable VLM prompts (e.g.,...

TMLR, 2025

G2D2: Gradient-Guided Discrete Diffusion for Image Inverse Problem Solving

Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Bac Nguyen, Stefano Ermon*, Yuki Mitsufuji

Recent literature has effectively leveraged diffusion models trained on continuous variables as priors for solving inverse problems. Notably, discrete diffusion models with discrete latent codes have shown strong performance, particularly in modalities suited for discrete co...

TISMIR, 2025

Reductive, Exclusionary, Normalising: The Limits of Generative AI

Fabio Morreale, Marco A. Martínez-Ramírez, Raul Masu, WeiHsiang Liao, Yuki Mitsufuji

Up until recently, most approaches to music generation were based on deductive logic: generative rules were devised on the basis of musicians’ preferences, subjective appreciation and dominant music theories. Machine learning (ML) introduced a paradigm shift: vast datasets o...

JAES, 2025

Reverse Engineering of Music Mixing Graphs With Differentiable Processors and Iterative Pruning

Sungho Lee*, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Stefan Uhlich*, Giorgio Fabbro*, Kyogu Lee*, Yuki Mitsufuji

Reverse engineering of music mixes aims to uncover how dry source signals are processed and combined to produce a final mix. In this paper, prior works are extended to reflect the compositional nature of mixing and search for a graph of audio processors. First, a mixing cons...

DAFX, 2025

DiffVox: A Differentiable Model for Capturing and Analysing Professional Effects Distributions

Chin-Yun Yu, Marco A. Martínez-Ramírez, Junghyun Koo, Ben Hayes, Wei-Hsiang Liao, György Fazekas, Yuki Mitsufuji

This study introduces a novel and interpretable model, DiffVox, for matching vocal effects in music production. DiffVox, short for ``Differentiable Vocal Fx", integrates parametric equalisation, dynamic range control, delay, and reverb with efficient differentiable implement...

WASPAA, 2025

Improving Inference-Time Optimisation for Vocal Effects Style Transfer with a Gaussian Prior

Chin-Yun Yu, Marco A. Martínez-Ramírez, Junghyun Koo, Wei-Hsiang Liao, Yuki Mitsufuji, György Fazekas

Style Transfer with Inference-Time Optimisation (ST-ITO) is a recent approach for transferring the applied effects of a reference audio to a raw audio track. It optimises the effect parameters to minimise the distance between the style embeddings of the processed audio and t...

ACL, 2025

In-Domain African Languages Translation Using LLMs and Multi-armed Bandits

Pratik Rakesh Singh, Kritarth Prasad, Mohammadi Zaki, Pankaj Wasnik

Neural Machine Translation (NMT) systems face significant challenges when working with low-resource languages, particularly in domain adaptation tasks. These difficulties arise due to limited training data and suboptimal model generalization, As a result, selecting an opti- ...

ACL, 2025

Graph-Assisted Culturally Adaptable Idiomatic Translation for Indic languages

Pratik Rakesh Singh, Kritarth Prasad, Mohammadi Zaki, Pankaj Wasnik

Translating multi-word expressions (MWEs) and idioms requires a deep understanding of the cultural nuances of both the source and target languages. This challenge is further amplified by the one-to-many nature of idiomatic translations, where a single source idiom can have m...

WASPAA, 2025

Can Large Language Models Predict Audio Effects Parameters from Natural Language?

Seungheon Doh*, Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Juhan Nam*, Yuki Mitsufuji

In music production, manipulating audio effects (Fx) parameters through natural language has the potential to reduce technical barriers for non-experts. We present LLM2Fx, a framework leveraging Large Language Models (LLMs) to predict Fx parameters directly from textual desc...

ISMIR, 2025

Fx-Encoder++: Extracting Instrument-Wise Audio Effects Representations from Mixtures

Yen-Tung Yeh, Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Yi-Hsuan Yang, Yuki Mitsufuji

General-purpose audio representations have proven effective across diverse music information retrieval applications, yet their utility in intelligent music production remains limited by insufficient understanding of audio effects (Fx). Although previous approaches have empha...

ISMIR, 2025

ITO-Master: Inference-Time Optimization for Audio Effects Modeling of Music Mastering Processors

Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Giorgio Fabbro*, Michele Mancusi, Yuki Mitsufuji

Music mastering style transfer aims to model and apply the mastering characteristics of a reference track to a target track, simulating the professional mastering process. However, existing methods apply fixed processing based on a reference track, limiting users' ability to...

ISMIR, 2025

Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning

Yixiao Zhang, Yukara Ikemiya, Woosung Choi, Naoki Murata, Marco A. Martínez-Ramírez, Liwei Lin, Gus Xia, Wei-Hsiang Liao, Yuki Mitsufuji, Simon Dixon

Recent advances in text-to-music editing, which employ text queries to modify music (e.g. by changing its style or adjusting instrumental components), present unique challenges and opportunities for AI-assisted music creation. Previous approaches in this domain have been co...

KDD, 2025

I See, Therefore I Do: Estimating Causal Effects for Image Treatments

A Thorat, R Kolla, Niranjan Pedanekar*

Causal effect estimation under observational studies is challenging due to the lack of ground truth data and treatment assignment bias. Though various methods exist in literature for addressing this problem, most of them ignore multi-dimensional treatment information by cons...

ACL, 2025

Bridging Perceptual Gaps in Food NLP: A Structured Approach Using Sensory Anchors

Kana Maruyama, Angel Hsing-Chi Hwang, Tarek R Besold

Understanding how humans perceive and describe food is essential for NLP applications such as semantic search, recommendation, and structured food communication. However, textual similarity often fails to reflect perceptual similarity, which is shaped by sensory experience, ...

MLCAD, 2025

GENIE-ASI: Generative Instruction and Executable Code for Analog Subcircuit Identification

Phuoc Pham, Arun Venkitaraman, Chia-Yu Hsieh, Andrea Bonetti, Stefan Uhlich*, Markus Leibl, Simon Hofmann, Eisaku Ohbuchi, Lorenzo Servadei, Ulf Schlichtmann, Robert Wille

Analog subcircuit identification is a core task in analog design, essential for simulation, sizing, and layout. Traditional methods often require extensive human expertise, rule-based encoding, or large labeled datasets. To address these challenges, we propose GENIE-ASI, the...

ACMMM, 2025

CCStereo: Audio-Visual Contextual and Contrastive Learning for Binaural Audio Generation

Yuanhong Chen, Kazuki Shimada, Christian Simon, Yukara Ikemiya, Takashi Shibuya, Yuki Mitsufuji

Binaural audio generation (BAG) aims to convert monaural audio to stereo audio using visual prompts, requiring a deep understanding of spatial and semantic information. The success of the BAG systems depends on the effectiveness of cross-modal reasoning and spatial understan...

MLCAD, 2025

Schemato -- An LLM for Netlist-to-Schematic Conversion

Ryoga Matsuo, Stefan Uhlich*, Arun Venkitaraman, Andrea Bonetti, Chia-Yu Hsieh, Ali Momeni, Lukas Mauch*, Augusto Capone, Eisaku Ohbuchi, Lorenzo Servadei

Machine learning models are advancing circuit design, particularly in analog circuits. They typically generate netlists that lack human interpretability. This is a problem as human designers heavily rely on the interpretability of circuit diagrams or schematics to intuitivel...

ICCV, 2025

TITAN-Guide: Taming Inference-Time Alignment for Guided Text-to-Video Diffusion Models

Christian Simon, Masato Ishii, Akio Hayakawa, Zhi Zhong*, Shusuke Takahashi*, Takashi Shibuya, Yuki Mitsufuji

In the recent development of conditional diffusion models still require heavy supervised fine-tuning for performing control on a category of tasks. Training-free conditioning via guidance with off-the-shelf models is a favorable alternative to avoid further fine-tuning on th...

ICCV, 2025

Beyond RGB: Adaptive Parallel Processing for RAW Object Detection

Shani Gamrian, Hila Barel, Feiran Li, Masakazu Yoshimura*, Daisuke Iso

Object detection models are typically applied to standard RGB images processed through Image Signal Processing (ISP) pipelines, which are designed to enhance sensor-captured RAW images for human vision. However, these ISP functions can lead to a loss of critical information ...

ICCV, 2025

Image Intrinsic Scale Assessment: Bridging the Gap Between Quality and Resolution

Vlad Hosu, Lorenzo Agnolucci, Daisuke Iso, Dietmar Saupe*

Image Quality Assessment (IQA) measures and predicts perceived image quality by human observers. Although recent studies have highlighted the critical influence that variations in the scale of an image have on its perceived quality, this relationship has not been systematica...

ICCV, 2025

DuET: Dual Incremental Object Detection via Exemplar-Free Task Arithmetic

Munish Monga, Vishal Chudasama, Pankaj Wasnik, Biplab Banerjee*

Real-world object detection systems, such as those in autonomous driving and surveillance, must continuously learn new object categories and simultaneously adapt to changing environmental conditions. Existing approaches, Class Incremental Object Detection (CIOD) and Domain I...

ICCV, 2025

Transformed Low-rank Adaptation via Tensor Decomposition and Its Applications to Text-to-image Models

Zerui Tao, Yuhta Takida, Naoki Murata, Qibin Zhao*, Yuki Mitsufuji

Parameter-Efficient Fine-Tuning (PEFT) of text-to-image models has become an increasingly popular technique with many applications. Among the various PEFT methods, Low-Rank Adaptation (LoRA) and its variants have gained significant attention due to their effectiveness, enabl...

TOCHI, 2025

Transphobia is in the Eye of the Prompter: Trans-Centered Perspectives on Large Language Models

Morgan Klaus Scheuerman, Katy Weathington, Adrian Petterson, Dylan Thomas Doyle, Dipto Das, Michael Ann DeVito, Jed R. Brubaker

Large language models (LLMs) are the new hot trend being rapidly integrated into products and services—often, in chatbots. LLM-powered chatbots are expected to respond to any number of topics, including topics central to gender identity. In light of rising anti-trans discour...

AI4X, 2025

Literature-based Hypothesis Generation: Predicting the evolution of scientific literature to support scientists

Tarek R Besold, Uchenna Akujuobi, Samy Badreddine, Jihun Choi, Hatem ElShazly, Frederick Gifford, Kana Maruyama, Kae Nagano, Pablo Sanchez Martin, Thiviyan Thanapalasingam, Alessandra Toniato, Christoph Wehner

Science is advancing at an increasingly quick pace, as evidenced, for instance, by the exponential growth in the number of published research articles per year [1]. On the one hand, this poses anincreasingly pressing challenge: Effectively navigating this ever-growing body o...

AI4X, 2025

Gastro-Health Project: Revolutionizing Personalized Nutrition and Health Forecasting Through Integrated AI Technologies

Uchenna Akujuobi, Jiu Yi, Maria Enrique Chung, Tarek Besold

Knowledge graphs are powerful tools for modelling complex, multi-relational data and supporting hypothesis generation, particularly in applications like drug repurposing. However, for predictive methods to gain acceptance as credible scientific tools, they must ensure not on...

CVPR, 2025

Precise Event Spotting in Sports Videos: Solving Long-Range Dependency and Class Imbalance

Sanchayan Santra, Vishal Chudasama, Pankaj Wasnik, Vineeth N Balasubramanian

Precise Event Spotting (PES) aims to identify events and their class from long, untrimmed videos, particularly in sports. The main objective of PES is to detect the event at the exact moment it occurs. Existing methods mainly rely on features from a large pre-trained network...

INTERSPEECH, 2025

A Comprehensive Real-World Assessment of Audio Watermarking Algorithms: Will They Survive Neural Codecs?

Yigitcan Özer, Woosung Choi, Joan Serrà, Mayank Kumar Singh*, Wei-Hsiang Liao, Yuki Mitsufuji

We introduce the Robust Audio Watermarking Benchmark (RAW-Bench), a benchmark for evaluating deep learning-based audio watermarking methods with standardized and systematic comparisons. To simulate real-world usage, we introduce a comprehensive audio attack pipeline with var...

ICML, 2025

Proto Successor Measure: Representing the Space of All Possible Solutions of Reinforcement Learning

Siddhant Agarwal*, Harshit Sikchi, Peter Stone, Amy Zhang*

Having explored an environment, intelligent agents should be able to transfer their knowledge to most downstream tasks within that environment. Referred to as ``zero-shot learning," this ability remains elusive for general-purpose reinforcement learning algorithms. While rec...

ICML, 2025

How to Evaluate and Mitigate IP Infringement in Visual Generative AI?

Zhenting Wang, Chen Chen, Vikash Sehwag, Minzhou Pan*, Lingjuan Lyu

The popularity of visual generative AI models like DALL-E 3, Stable Diffusion XL, Stable Video Diffusion, and Sora has been increasing. Through extensive evaluation, we discovered that the state-of-the-art visual generative models can generate content that bears a striking r...

ICML, 2025

Hyperspherical Normalization for Scalable Deep Reinforcement Learning

Hojoon Lee, Youngdo Lee, Takuma Seno, Donghu Kim, Peter Stone, Jaegul Choo

Scaling up the model size and computation has brought consistent performance improvements in supervised learning. However, this lesson often fails to apply to reinforcement learning (RL) because training the model on non-stationary data easily leads to overfitting and unstab...

ICML, 2025

Training Consistency Models with Variational Noise Coupling

Gianluigi Silvestri, Luca Ambrogioni, Chieh-Hsin Lai, Yuhta Takida, Yuki Mitsufuji

Consistency Training (CT) has recently emerged as a promising alternative to diffusion models, achieving competitive performance in image generation tasks. However, non-distillation consistency training often suffers from high variance and instability, and analyzing and impr...

ICML, 2025

Supervised Contrastive Learning from Weakly-labeled Audio Segments for Musical Version Matching

Joan Serrà, R. Oguz Araz, Dmitry Bogdanov, Yuki Mitsufuji

Detecting musical versions (different renditions of the same piece) is a challenging task with important applications. Because of the ground truth nature, existing approaches match musical versions at the track level (e.g., whole song). However, most applications require to ...

ICML, 2025

Distillation of Discrete Diffusion through Dimensional Correlations

Satoshi Hayakawa*, Yuhta Takida, Masaaki Imaizumi*, Hiromi Wakaki*, Yuki Mitsufuji

Diffusion models have demonstrated exceptional performances in various fields of generative modeling, but suffer from slow sampling speed due to their iterative nature. While this issue is being addressed in continuous domains, discrete diffusion models face unique challenge...

TMLR, 2025

Music Foundation Model as Generic Booster for Music Downstream Tasks

WeiHsiang Liao, Yuhta Takida, Yukara Ikemiya, Zhi Zhong*, Chieh-Hsin Lai, Giorgio Fabbro*, Kazuki Shimada, Keisuke Toyama*, Kinwai Cheuk, Marco A. Martínez-Ramírez, Shusuke Takahashi*, Stefan Uhlich*, Taketo Akama*, Woosung Choi, Yuichiro Koyama*, Yuki Mitsufuji

We demonstrate the efficacy of using intermediate representations from a single foundation model to enhance various music downstream tasks. We introduce SoniDo, a music foundation model (MFM) designed to extract hierarchical features from target music samples. By leveraging ...

RA-L, 2025

A Champion-level Vision-based Reinforcement Learning Agent for Competitive Racing in Gran Turismo 7

Hojoon Lee, Takuma Seno, Jun Jet Tai, Kaushik Subramanian, Kenta Kawamoto, Peter Stone, Peter R. Wurman

Deep reinforcement learning has achieved superhuman racing performance in high-fidelity simulators like Gran Turismo 7 (GT7). It typically utilizes global features that require instrumentation external to a car, such as precise localization of agents and opponents, limiting ...

IJCNN, 2025

Improving Vector-Quantized Image Modeling with Latent Consistency-Matching Diffusion

Bac Nguyen, Chieh-Hsin Lai, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Yuki Mitsufuji

By embedding discrete representations into a continuous latent space, we can leverage continuous-space latent diffusion models to handle generative modeling of discrete data. However, despite their initial success, most latent diffusion methods rely on fixed pretrained embed...

IJCNN, 2025

A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation

Masato Ishii, Akio Hayakawa, Takashi Shibuya, Yuki Mitsufuji

In this work, we build a simple but strong baseline for sounding video generation. Given base diffusion models for audio and video, we integrate them with additional modules into a single model and train it to make the model jointly generate audio and video. To enhance align...

CVPR, 2025

MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training

Kengo Uchida, Takashi Shibuya, Yuhta Takida, Naoki Murata, Julian Tanke, Shusuke Takahashi*, Yuki Mitsufuji

In text-to-motion generation, controllability as well as generation quality and speed has become increasingly critical. The controllability challenges include generating a motion of a length that matches the given textual description and editing the generated motions accordi...

CVPR, 2025

Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models

Jie Ren, Kangrui Chen, Yingqian Cui, Shenglai Zeng, Hui Liu, Yue Xing, Jiliang Tang, Lingjuan Lyu

Text-to-image (T2I) diffusion models have shown exceptional capabilities in generating images that closely correspond to textual prompts. However, the advancement of T2I diffusion models presents significant risks, as the models could be exploited for malicious purposes, suc...

CVPR, 2025

CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI

Siyuan Cheng, Lingjuan Lyu, Zhenting Wang, Xiangyu Zhang, Vikash Sehwag

With the rapid advancement of generative AI, it is now pos-sible to synthesize high-quality images in a few seconds.Despite the power of these technologies, they raise signif-icant concerns regarding misuse. Current efforts to dis-tinguish between real and AI-generated image...

CVPR, 2025

Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget

Vikash Sehwag, Xianghao Kong, Jingtao Li, Michael Spranger, Lingjuan Lyu

As scaling laws in generative AI push performance, they simultaneously concentrate the development of these models among actors with large computational resources. With a focus on text-to-image (T2I) generative models, we aim to unlock this bottleneck by demonstrating very l...

CVPR, 2025

Argus: A Compact and Versatile Foundation Model for Vision

Weiming Zhuang, Chen Chen, Zhizhong Li, Sina Sajadmanesh, Jingtao Li, Jiabo Huang, Vikash Sehwag, Vivek Sharma, Hirotaka Shinozaki, Felan Carlo Garcia, Yihao Zhan, Naohiro Adachi, Ryoji Eki, Michael Spranger, Peter Stone, Lingjuan Lyu

While existing vision and multi-modal foundation models can handle multiple computer vision tasks, they often suffer from significant limitations, including huge demand for data and computational resources during training and inconsistent performance across vision tasks at d...

CVPR, 2025

ReRAW: RGB-to-RAW Image Reconstruction via Stratified Sampling for Efficient Object Detection on the Edge

Radu Berdan, Beril Besbinar, Christoph Reinders, Junji Otsuka*, Daisuke Iso

Edge-based computer vision models running on compact, resource-limited devices benefit greatly from using unprocessed, detail-rich RAW sensor data instead of processed RGB images. Training these models, however, necessitates large labeled RAW datasets, which are costly and o...

CVPR, 2025

Noise Modeling in One Hour: Minimizing Preparation Efforts for Self-supervised Low-Light RAW Image Denoising

Feiran Li, Haiyang Jiang, Daisuke Iso

Noise synthesis is a promising solution for addressing the data shortage problem in data-driven low-light RAW image denoising. However, accurate noise synthesis methods often necessitate labor-intensive calibration and profiling procedures during preparation, preventing them...

CVPR, 2025

VinaBench: Benchmark for Faithful and Consistent Visual Narratives

Silin Gao*, Sheryl Mathew, Li Mi, Sepideh Mamooler, Mengjie Zhao*, Hiromi Wakaki*, Yuki Mitsufuji, Syrielle Montariol, Antoine Bosselut*

Visual narrative generation transforms textual narratives into sequences of images illustrating the content of the text. However, generating visual narratives that are faithful to the input text and self-consistent across generated images remains an open challenge, due to th...

SIGIR, 2025

LLM-BRec: Personalizing Session-based Social Recommendation with LLM-BERT Fusion Framework

Raksha Jalan, Tushar Prakash, Niranjan Pedanekar*

Recommendation models enhance online user engagement by suggesting personalized content, boosting satisfaction and retention. Session-based Recommender systems (SR) have become a significant approach, focusing on capturing users' short-term preferences for more accurate reco...

NAACL, 2025

Faster Machine Translation Ensembling with Reinforcement Learning and Competitive Correction

Kritarth Prasad, Mohammadi Zaki, Pratik Singh, Pankaj Wasnik

Ensembling neural machine translation (NMT) models to produce higher-quality translations than the $L$ individual models has been extensively studied. Recent methods typically employ a candidate selection block (CSB) and an encoder-decoder fusion block (FB), requiring infere...

ICLR, 2025

Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space

Yangming Li, Chieh-Hsin Lai, Carola-Bibiane Schönlieb, Yuki Mitsufuji, Stefano Ermon*

Deep Generative Models (DGMs), including Energy-Based Models (EBMs) and Score-based Generative Models (SGMs), have advanced high-fidelity data generation and complex continuous distribution approximation. However, their application in Markov Decision Processes (MDPs), partic...

CVPR, 2025

Classifier-Free Guidance inside the Attraction Basin May Cause Memorization

Anubhav Jain, Yuya Kobayashi, Takashi Shibuya, Yuhta Takida, Nasir Memon, Julian Togelius, Yuki Mitsufuji

Diffusion models are prone to exactly reproduce images from the training data. This exact reproduction of the training data is concerning as it can lead to copyright infringement and/or leakage of privacy-sensitive information. In this paper, we present a novel way to unders...

CVPR, 2025

MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Ho Kei Cheng, Masato Ishii, Akio Hayakawa, Takashi Shibuya, Alexander Schwing, Yuki Mitsufuji

We propose to synthesize high-quality and synchronized audio, given video and optional text conditions, using a novel multimodal joint training framework MMAudio. In contrast to single-modality training conditioned on (limited) video data only, MMAudio is jointly trained wit...

IEEE ACCESS, 2025

Transformative Movie Discovery: Large Language Models for Recommendation and Genre Prediction

Shubham Raj, Anurag Sharma, Sriparna Saha*, Brijraj Singh*, Niranjan Pedanekar*

In the era of digital streaming platforms, personalized movie recommendations, and genre prediction have become pivotal for enhancing user engagement and satisfaction. With the growing number of OTT (Over-The-Top) platforms like Netflix, Amazon Prime Video, and Disney+, the ...

WWW, 2025

Efficacy of Large Language Models in Predicting Hindi Movies' Attributes: A Comprehensive Survey and Content-Based Analysis

Prabir Mondal*, Siddharth Singh*, Kushum*, Sriparna Saha*, Jyoti Prakash Singh*, Brijraj Singh*, Niranjan Pedanekar*

This research explores the efficacy of four state-of-the-art Large Language Models (LLMs): GPT-3.5-turbo-0301, Vicuna, PaLM 2, and Dolly in predicting (i) movie genres using audio transcripts of movie trailers and (ii) meta-information such as director and cast details using...

WWW, 2025

Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Model

Jie Ren, Kangrui Chen, Chen Chen, Vikash Sehwag, Yue Xing, Jiliang Tang, Lingjuan Lyu

Large Language Models (LLMs) and Vision-Language Models (VLMs) have made significant advancements in a wide range of natural language processing and vision-language tasks. Access to large web-scale datasets has been a key factor in their success. However, concerns have been ...

CVPRW, 2025

Cross-Modal Fusion and Attention Mechanism for Weakly Supervised Video Anomaly Detection

Ayush Ghadiya, Purbayan Kar, Vishal Chudasama, Pankaj Wasnik

Recently, weakly supervised video anomaly detection (WS-VAD) has emerged as a contemporary research direction to identify anomaly events like violence and nudity in videos using only video-level labels. However, this task has substantial challenges, including addressing imba...

ICASSP, 2025

VRVQ: Variable Bitrate Residual Vector Quantization for Audio Compression

Yunkee Chae, Woosung Choi, Yuhta Takida, Junghyun Koo, Yukara Ikemiya, Zhi Zhong*, Kin Wai Cheuk, Marco A. Martínez-Ramírez, Kyogu Lee*, Wei-Hsiang Liao, Yuki Mitsufuji

Recent state-of-the-art neural audio compression models have progressively adopted residual vector quantization (RVQ). Despite this success, these models employ a fixed number of codebooks per frame, which can be suboptimal in terms of rate-distortion tradeoff, particularly ...

ICASSP, 2025

Latent Diffusion Bridges for Unsupervised Musical Audio Timbre Transfer

Michele Mancusi, Yurii Halychanskyi, Kin Wai Cheuk, Eloi Moliner, Chieh-Hsin Lai, Stefan Uhlich*, Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Giorgio Fabbro*, Yuki Mitsufuji

Music timbre transfer is a challenging task that involves modifying the timbral characteristics of an audio signal while preserving its melodic structure. In this paper, we propose a novel method based on dual diffusion bridges, trained using the CocoChorales Dataset, which ...

ICASSP, 2025

30+ Years of Source Separation Research: Achievements and Future Challenges

Shoko Araki, Nobutaka Ito, Reinhold Haeb-Umbach, Gordon Wichern, Zhong-Qiu Wang, Yuki Mitsufuji

Source separation (SS) of acoustic signals is a research field that emerged in the mid-1990s and has flourished ever since. On the occasion of ICASSP's 50th anniversary, we review the major contributions and advancements in the past three decades in the speech, audio, and mu...

AAAI, 2025

Exploit Gradient Skewness to Circumvent Byzantine Defenses for Federated Learning

Yuchen Liu*, Chen Chen, Lingjuan Lyu, Yaochu Jin, Gang Chen*

Federated Learning (FL) is notorious for its vulnerability to Byzantine attacks. Most current Byzantine defenses share a common inductive bias: among all the gradients, the densely distributed ones are more likely to be honest. However, such a bias is a poison to Byzantine r...

EXPLIMED, 2025

Identifying Candidates for Protein-Protein Interaction: A Focus on NKp46’s Ligands

Alessia Borghini, Federico Di Valerio, Alessio Ragno*, Roberto Capobianco

Recent advances in protein-protein interaction (PPI) research have harnessed the power of artificialintelligence (AI) to enhance our understanding of protein behaviour. These approaches have becomeindispensable tools in the field of biology and medicine, enabling scientists ...

ECAI, 2025

Neural Reward Machines

Elena Umili*, Francesco Argenziano*, Roberto Capobianco

Non-markovian Reinforcement Learning (RL) tasks arevery hard to solve, because agents must consider the entire history ofstate-action pairs to act rationally in the environment. Most works usesymbolic formalisms (as Linear Temporal Logic or automata) to specify the temporall...

ECAI, 2025

Transparent Explainable Logic Layers

Alessio Ragno*, Marc Plantevit, Celine Robardet, Roberto Capobianco

Explainable AI seeks to unveil the intricacies of black box models through post-hoc strategies or self-interpretable models. In this paper, we tackle the problem of building layers that are intrinsically explainable through logical rules. In particular, we address current st...

ECCV, 2025

AIM 2024 Challenge on UHD Blind Photo Quality Assessment

Vlad Hosu, Marcos V. Conde, Lorenzo Agnolucci, Nabajeet Barman, Saman Zadtootaghaj, Radu Timofte

We introduce the AIM 2024 UHD-IQA Challenge, a competition to advance the No-Reference Image Quality Assessment (NR-IQA) task for modern, high-resolution photos. The challenge is based on the recently released UHD-IQA Benchmark Database, which comprises 6,073 UHD-1 (4K) imag...

ECCV, 2025

UHD-IQA Benchmark Database: Pushing the Boundaries of Blind Photo Quality Assessment

Vlad Hosu, Lorenzo Agnolucci, Oliver Wiedemann, Daisuke Iso, Dietmar Saupe*

We introduce a novel Image Quality Assessment (IQA) dataset comprising 6073 UHD-1 (4K) images, annotated at a fixed width of 3840 pixels. Contrary to existing No-Reference (NR) IQA datasets, ours focuses on highly aesthetic photos of high technical quality, filling a gap in ...

NEURIPS, 2025

Disentangling Mixtures of Musical Instruments for Source-level Pitch and Timbre Manipulation

Yin-Jyun Luo, Kin Wai Cheuk, Woosung Choi, Toshimitsu Uesaka, Keisuke Toyama*, Koichi Saito, Chieh-Hsin Lai, Yuhta Takida, Wei-Hsiang Liao, Simon Dixon, Yuki Mitsufuji

Existing work on pitch and timbre disentanglement has been mostly focused on single-instrument music audio, excluding the cases where multiple instruments are presented. To fill the gap, we propose DisMix, a generative framework in which the pitch and timbre representations ...

NEURIPS, 2025

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Koichi Saito, Dongjun Kim*, Takashi Shibuya, Chieh-Hsin Lai, Zhi Zhong*, Yuhta Takida, Yuki Mitsufuji

Sound content is an indispensable element for multimedia works such as video games, music, and films. Recent high-quality diffusion-based sound generation models can serve as valuable tools for the creators. However, despite producing high-quality sounds, these models often ...

NEURIPS, 2025

LOCKEY: A Novel Approach to Model Authentication and Deepfake Tracking

Mayank Kumar Singh*, Naoya Takahashi, Wei-Hsiang Liao, Yuki Mitsufuji

This paper presents a novel approach to deter unauthorized deepfakes and enable user tracking in generative models, even when the user has full access to the model parameters, by integrating key-based model authentication with watermarking techniques. Our method involves pro...

WACV, 2025

RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image Generation

Christoph Reinders, Radu Berdan, Beril Besbinar, Junji Otsuka*, Daisuke Iso

Current deep learning approaches in computer vision primarily focus on RGB data sacrificing information. In contrast, RAW images offer richer representation, which is crucial for precise recognition, particularly in challenging conditions like low-light environments. The res...

ICLR, 2025

SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning

Hojoon Lee, Dongyoon Hwang, Donghu Kim, Hyunseung Kim, Jun Jet Tai, Kaushik Subramanian, Peter R. Wurman, Jaegul Choo, Peter Stone, Takuma Seno

Recent advances in CV and NLP have been largely driven by scaling up the number of network parameters, despite traditional theories suggesting that larger networks are prone to overfitting. These large networks avoid overfitting by integrating components that induce a simpli...

ICLR, 2025

Residual-MPPI: Online Policy Customization for Continuous Control

Pengcheng Wang, Chenran Li, Catherine Weaver*, Kenta Kawamoto, Masayoshi Tomizuka*, Chen Tang*, Wei Zhan*

Policies learned through Reinforcement Learning (RL) and ImitationLearning (IL) have demonstrated significant potential in achieving advanced performance in continuous control tasks. However, in real-world environments, itis often necessary to further customize a trained pol...

ICLR, 2025

Weighted Point Cloud Embedding for Multimodal Contrastive Learning Toward Optimal Similarity Metric

Toshimitsu Uesaka, Taiji Suzuki, Yuhta Takida, Chieh-Hsin Lai, Naoki Murata, Yuki Mitsufuji

In typical multimodal contrastive learning, such as CLIP, encoders produce onepoint in the latent representation space for each input. However, one-point representation has difficulty in capturing the relationship and the similarity structure of a huge amount of instances in...

ICLR, 2025

Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning

Shang-Fu Chen, Chieh-Hsin Lai, Dongjun Kim*, Naoki Murata, Takashi Shibuya, Wei-Hsiang Liao, Shao-Hua Sun, Yuki Mitsufuji, Ayano Hiranaka

Controllable generation through Stable Diffusion (SD) fine-tuning aims to improve fidelity, safety, and alignment with human guidance. Existing reinforcement learning from human feedback methods usually rely on predefined heuristic reward functions or pretrained reward model...

ICLR, 2025

Mining your own secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models

Saurav Jha, Shiqi Yang*, Masato Ishii, Mengjie Zhao*, Christian Simon, Muhammad Jehanzeb Mirza, Dong Gong, Lina Yao, Shusuke Takahashi*, Yuki Mitsufuji

Personalized text-to-image diffusion models have grown popular for their ability to efficiently acquire a new concept from user-defined text descriptions and a few images. However, in the real world, a user may wish to personalize a model on multiple concepts but one at a ti...

ICLR, 2025

Jump Your Steps: Optimizing Sampling Schedule of Discrete Diffusion Models

Yong-Hyun Park, Chieh-Hsin Lai, Satoshi Hayakawa*, Yuhta Takida, Yuki Mitsufuji

Diffusion models have seen notable success in continuous domains, leading to the development of discrete diffusion models (DDMs) for discrete variables. Despite recent advances, DDMs face the challenge of slow sampling speeds. While parallel sampling methods like -leaping ac...

ICLR, 2025

Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation

Akio Hayakawa, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji

In this study, we aim to construct an audio-video generative model with minimal computational cost by leveraging pre-trained single-modal generative models for audio and video. To achieve this, we propose a novel method that guides single-modal models to cooperatively genera...

ICLR, 2025

SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation

Koichi Saito, Dongjun Kim*, Takashi Shibuya, Chieh-Hsin Lai, Zhi Zhong*, Yuhta Takida, Yuki Mitsufuji

Sound content creation, essential for multimedia works such as video games and films, often involves extensive trial-and-error, enabling creators to semantically reflect their artistic ideas and inspirations, which evolve throughout the creation process, into the sound. Rece...

RO-MAN, 2025

Dobby: A Conversational Service Robot Driven by GPT-4

Carson Stark, Bohkyung Chun, Casey Charleston, Varsha Ravi, Luis Pabon, Surya Sunkari, Tarun Mohan, Peter Stone, Justin Hart*

This work introduces a robotics platform which embeds a conversational AI agent in an embodied system for natural language understanding and intelligent decision-making for service tasks; integrating task planning and human-like conversation. The agent is derived from a larg...

NEURIPS, 2025

Disentangled Unsupervised Skill Discovery for Efficient Hierarchical Reinforcement Learning

Jiaheng Hu*, Roberto Martin-Martin*, Peter Stone, Zizhao Wang*

A hallmark of intelligent agents is the ability to learn reusable skills purely from unsupervised interaction with the environment. However, existing unsupervised skill discovery methods often learn entangled skills where one skill variable simultaneously influences many ent...

NEURIPS, 2025

SkiLD: Unsupervised Skill Discovery Guided by Factor Interactions

Zizhao Wang*, Jiaheng Hu*, Roberto Martin-Martin*, Amy Zhang*, Scott Niekum*, Peter Stone, Caleb Chuck, Stephen Chen

Unsupervised skill discovery carries the promise that an intelligent agent can learn reusable skills through autonomous, reward-free environment interaction. Existing unsupervised skill discovery methods learn skills by encouraging distinguishable behaviors that cover divers...

NEURIPS, 2025

N-Agent Ad Hoc Teamwork

Caroline Wang*, Arrasy Rahman*, Ishan Durugkar, Elad Liebman*, Peter Stone

Current approaches to learning cooperative multi-agent behaviors assume relatively restrictive settings. In standard fully cooperative multi-agent reinforcement learning, the learning algorithm controls all agents in the scenario, while in ad hoc teamwork, the learning algor...

CORL, 2025

Learning to Look: Seeking Information for Decision Making via Policy Factorization

Jiaheng Hu*, Peter Stone, Roberto Martin-Martin*, Ben Abbatematteo, Shivin Dass

Many robot manipulation tasks require active or interactive exploration behavior in order to be performed successfully. Such tasks are ubiquitous in embodied domains, where agents must actively search for the information necessary for each stage of a task, e.g., moving the h...

EMNLP, 2025

LaRS: Latent Reasoning Skills for Chain-of-Thought Reasoning

Zifan Xu*, Peter Stone, Dmitriy Bespalov, Haozhu Wang, Xian Wu, Yanjun Qi

Chain-of-thought (CoT) prompting is a popularin-context learning (ICL) approach for large language models (LLMs), especially when tackling complex reasoning tasks. Traditional ICL approaches construct prompts using examples that contain questions similar to the input questio...

WACV, 2025

Open-Set Object Detection By Aligning Known Class Representations

Vishal Chudasama, Naoyuki Onoe*, Pankaj Wasnik, Hiran Sarkar, Vineeth N Balasubramanian

Open-Set Object Detection (OSOD) has emerged as a contemporary research direction to address the detection of unknown objects. Recently, few works have achieved remarkable performance in the OSOD task by employing contrastive clustering to separate unknown classes. In contra...

ICASSP, 2025

Enhancing Whisper's Accuracy and Speed for Indian Languages through Prompt-Tuning and Tokenization

Pankaj Wasnik, Kumud Tripathi, Raj Gothi

Automatic speech recognition has recently seen a significant advancement with large foundational models such as Whisper. However, these models often struggle to perform well in low-resource languages, such as Indian languages. This paper explores two novel approaches to enha...

AAAI, 2025

EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion

Ashishkumar Gudmalwar, Nirmesh Shah*, Pankaj Wasnik, Ishan Biyani, Rajiv R. Shah

The Emotional Voice Conversion (EVC) aims to convert the discrete emotional state from the source emotion to the target for a given speech utterance while preserving linguistic content. In this paper, we propose regularizing emotion intensity in the diffusion-based EVC frame...

AAAI, 2025

Enhancing Entertainment Translation for Indian Languages using Adaptive Context, Style and LLMs

Mohammadi Zaki, Pankaj Wasnik, Pratik Rakesh Singh

We address the challenging task of neural machine translation (NMT) in the entertainment domain, where the objective is to automatically translate a given dialogue from a source language content to a target language. This task has various applications, particularly in automa...

ECAI, 2025

DeepDFA: Automata Learning through Neural Probabilistic Relaxations

Elena Umili*, Roberto Capobianco

In this work, we introduce DeepDFA, a novel approach to identifying Deterministic Finite Automata (DFAs) from traces, harnessing a differentiable yet discrete model. Inspired by both the probabilistic relaxation of DFAs and Recurrent Neural Networks (RNNs), our model offers ...

MIREX, 2025

Discogs-VINet-MIREX

Xavier Serra, Yuki Mitsufuji, R.O. Araz, J. Serrà, D. Bogdanov

This technical report presents our submission to the cover song identification task for the 2024 edition of the Music Information Retrieval Evaluation eXchange (MIREX). For this submission, we enhanced our Discogs-VINet model by changing the definition of an epoch, incorpora...

NEURIPS, 2024

N-agent Ad Hoc Teamwork

Caroline Wang*, Arrasy Rahman*, Ishan Durugkar, Elad Liebman*, Peter Stone

NEURIPS, 2024

Towards Exact Gradient-based Training on Analog In-memory Computing

Zhaoxian Wu*, Tayfun Gokmen*, Malte J. Rasch, Tianyi Chen*

Analog in-memory accelerators present a promising solution for energy-efficient training and inference of large vision or language models. While the inference on analog accelerators has been studied recently, the analog training perspective is under-explored. Recent studies ...

NEURIPS, 2024

PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher

Dongjun Kim*, Chieh-Hsin Lai, Wei-Hsiang Liao, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon*

To accelerate sampling, diffusion models (DMs) are often distilled into generators that directly map noise to data in a single step. In this approach, the resolution of the generator is fundamentally limited by that of the teacher DM. To overcome this limitation, we propose ...

NEURIPS, 2024

GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping

Junyoung Seo*, Kazumi Fukuda, Takashi Shibuya, Takuya Narihira, Naoki Murata, Shoukang Hu, Chieh-Hsin Lai, Seungryong Kim*, Yuki Mitsufuji

Generating novel views from a single image remains a challenging task due to the complexity of 3D scenes and the limited diversity in the existing multi-view datasets to train a model on. Recent research combining large-scale text-to-image (T2I) models with monocular depth e...

NEURIPS, 2024

A Taxonomy of Challenges to Curating Fair Datasets

Dora Zhao*, Morgan Klaus Scheuerman, Pooja Chitre*, Jerone Andrews, Georgia Panagiotidou*, Shawn Walker*, Kathleen H. Pine*, Alice Xiang

Despite extensive efforts to create fairer machine learning (ML) datasets, there remains a limited understanding of the practical aspects of dataset curation. Drawing from interviews with 30 ML dataset curators, we present a comprehensive taxonomy of the challenges and trade...

CORL, 2024

Policy composition via multi-objective reinforcement learning

Shruti Mishra, Ankit Anand, Jordan Hoffmann, Nicolas Heess, Martin Riedmiller, Abbas Abdolmaleki, Doina Precup

We enable reinforcement learning agents to learn successful behavior policies by utilizing relevant pre-existing teacher policies. The teacher policies are introduced as objectives, in addition to the task objective, in a multi-objective policy optimization setting. Using th...

NEURIPS, 2024

FLoRA: Federated Fine-Tuning Large Language Models with Heterogeneous Low- Rank Adaptations

Lingjuan Lyu, Ziyao Wang, Zheyu Shen, Yexiao He, Guoheng Sun, Hongyi Wang, Ang Li

The rapid development of Large Language Models (LLMs) has been pivotal in advancing AI, with pre-trained LLMs being adaptable to diverse downstream tasks through fine-tuning. Federated learning (FL) further enhances fine-tuning in a privacy-aware manner by utilizing clients'...

NEURIPS, 2024

pFedClub: Controllable Heterogeneous Model Aggregation for Personalized Federated Learning

Jiaqi Wang*, Lingjuan Lyu, Fenglong Ma*, Qi Li

Federated learning, a pioneering paradigm, enables collaborative model training without exposing users’ data to central servers. Most existing federated learning systems necessitate uniform model structures across all clients, restricting their practicality. Several methods ...

NEURIPS, 2024

CURE4Rec: A Benchmark for Recommendation Unlearning with Deeper Influence

Chaochao Chen*, Yizhao Zhang*, Lingjuan Lyu, Yuyuan Li*, Jiaming Zhang, Li Zhang, Biao Gong, Chenggang Yan

With increasing privacy concerns in artificial intelligence, regulations have mandated the right to be forgotten, granting individuals the right to withdraw their data from models. Machine unlearning has emerged as a potential solution to enable selective forgetting in model...

NEURIPS, 2024

FEDMEKI: A Benchmark for Scaling Medical Foundation Models via Federated Knowledge Injection

Jiaqi Wang*, Lingjuan Lyu, Fenglong Ma*, Xiaochen Wang, Jinghui Chen

This study introduces the Federated Medical Knowledge Injection (FedMEKI) platform, a new benchmark designed to address the unique challenges of integrating medical knowledge into foundation models under privacy constraints. By leveraging a cross-silo federated learning appr...

NEURIPS, 2024

DECO-Bench: Unified Benchmark for Decoupled Task-Agnostic Synthetic Data Release

Lingjuan Lyu, Vivek Sharma, Farzaneh Askari

In this work, we tackle the question of how to systematically benchmark task-agnostic decoupling methods for privacy-preserving machine learning (ML). Sharing datasets that include sensitive information often triggers privacy concerns, necessitating robust decoupling methods...

ECCV, 2024

SIMBA: Split Inference - Mechanisms, Benchmarks and Attacks

Abhishek Singh*, Vivek Sharma, Ramesh Raskar*, Rohan Sukumaran, John Mose, Jeffrey Chiu, Justin Yu

In this work, we tackle the question of how to benchmark reconstruction of inputs from deep neural networks (DNN) representations. This inverse problem is of great importance in the privacy community where obfuscation of features has been proposed as a technique for privacy-...

ECCV, 2024

Masked Differential Privacy

Sina Sajadmanesh, Vikash Sehwag, Lingjuan Lyu, Vivek Sharma, David Schneider, Saquib Sarfraz, Rainer Stiefelhagen

Privacy-preserving computer vision is an important emerg- ing problem in machine learning and artificial intelligence. The prevalent methods tackling this problem use differential privacy or anonymization and obfuscation techniques to protect the privacy of individuals. In b...

, 2024

Prosody as an Informative Teaching Signal for Agent Learning: Exploratory Studies and Algorithmic Implications

Akanksha Saran, Matilda Knierim, Sahil Jain, Murat Han Aydoğan, Kenneth Mitra, Kush Desai, Kim Baraka*

Agent learning from human interaction often relies on explicit signals, but implicit social cues, such as prosody in speech, could provide valuable information for more effective learning. This paperadvocates for the integration of prosody as a teaching signal to enhance age...

EMNLP, 2024

Images Speak Louder than Words: Understanding and Mitigating Bias in Vision-Language Model from a Causal Mediation Perspective

Zhaotian Weng*, Zijun Gao*, Jerone Andrews, Jieyu Zhao*

Vision-language models (VLMs) pre-trained on extensive datasets can inadvertently learn biases by correlating gender information with specific objects or scenarios. Current methods, which focus on modifying inputs and monitoring changes in the model's output probability scor...

EMNLP, 2024

Resampled Datasets Are Not Enough: Mitigating Societal Bias Beyond Single Attributes

Yusuke Hirota, Jerone Andrews, Dora Zhao*, Orestis Papakyriakopoulos*, Apostolos Modas, Yuta Nakashima*, Alice Xiang

We tackle societal bias in image-text datasets by removing spurious correlations between protected groups and image attributes. Traditional methods only target labeled attributes, ignoring biases from unlabeled ones. Using text-guided inpainting models, our approach ensures ...

NEURIPS, 2024

Discovering Creative Behaviors through DUPLEX: Diverse Universal Features for Policy Exploration

Borja G. Leon*, Francesco Riccio, Kaushik Subramanian, Pete Wurman, Peter Stone

The ability to approach the same problem from different angles is a cornerstone of human intelligence that leads to robust solutions and effective adaptation to problem variations. In contrast, current RL methodologies tend to lead to policies that settle on a single solutio...

ECCV, 2024

A Simple Background Augmentation Method for Object Detection with Diffusion Model

Yuhang Li, Xin Dong, Chen Chen, Weiming Zhuang, Lingjuan Lyu

In computer vision, it is well-known that a lack of data diversity will impair model performance. In this study, we address the challenges of enhancing the dataset diversity problem in order to benefit various downstream tasks such as object detection and instance segmentati...

ECCV, 2024

Efficient Bias Mitigation Without Privileged Information

Mateo Espinosa Zarlenga*, Swami Sankaranarayanan, Jerone Andrews, Zohreh Shams, Mateja Jamnik*, Alice Xiang

Deep neural networks trained via empirical risk minimisation often exhibit significant performance disparities across groups, particularly when group and task labels are spuriously correlated (e.g., “grassy background” and “cows”). Existing bias mitigation methods that aim t...

ECCV, 2024

Finding a needle in a haystack: A Black-Box Approach to Invisible Watermark Detection

Minzhou Pan*, Zhenting Wang, Xin Dong, Vikash Sehwag, Lingjuan Lyu, Xue Lin*

In this paper, we propose WaterMark Detection (WMD), the first invisible watermark detection method under a black-box and annotation-free setting. WMD is capable of detecting arbitrary watermarks within a given reference dataset using a clean non watermarked dataset as a ref...

NATURE COMMUNICATIONS, 2024

Fast and robust analog in-memory deep neural network training

Malte J. Rasch, Fabio Carta*, Omobayode Fagbohungbe*, Tayfun Gokmen*

Analog in-memory computing is a promising future technology for efficiently accelerating deep learning networks. While using in-memory computing to accelerate the inference phase has been studied extensively, accelerating the training phase has received less attention, despi...

AIR, 2024

Revisiting named entity recognition in food computing: enhancing performance and robustness

Uchenna Akujuobi, Shuhong Liu*, Tarek R Besold

In the ever-evolving domain of food computing, named entity recognition (NER) presents transformative potential that extends far beyond mere word tagging in recipes. Its implications encompass intelligent recipe recommendations, health analysis, and personalization. Neverthe...

AIR, 2024

Link prediction for hypothesis generation: an active curriculum learning infused temporal graph-based approach

Uchenna Akujuobi, Priyadarshini Kumari, Jihun Choi, Samy Badreddine, Kana Maruyama, Sucheendra K Palaniappan*, Tarek R Besold

Over the last few years Literature-based Discovery (LBD) has regained popularity as a means to enhance the scientific research process. The resurgent interest has spurred the development of supervised and semi-supervised machine learning models aimed at making previously imp...

ACL, 2024

It is Simple Sometimes: A Study On Improving Aspect-Based Sentiment Analysis Performance

Laura Cabello*, Uchenna Akujuobi

Aspect-Based Sentiment Analysis (ABSA) involves extracting opinions from textual data about specific entities and their corresponding aspects through various complementary subtasks. Several prior research has focused on developing ad hoc designs of varying complexities for t...

ACL, 2024

Analysis of Multi-Source Language Training in Cross-Lingual Transfer

Seong Hoon Lim*, Taejun Yun*, Jinhyeon Kim*, Jihun Choi, Taeuk Kim

The successful adaptation of multilingual language models (LMs) to a specific language-task pair critically depends on the availability of data tailored for that condition. While cross-lingual transfer (XLT) methods have contributed to addressing this data scarcity problem, ...

RLC, 2024

A Super-human Vision-based Reinforcement Learning Agent for Autonomous Racing in Gran Turismo

Miguel Vasco*, Takuma Seno, Kenta Kawamoto, Kaushik Subramanian, Pete Wurman, Peter Stone

Racing autonomous cars faster than the best human drivers has been a longstanding grand challenge for the fields of Artificial Intelligence and robotics. Recently, an end-to-end deep reinforcement learning agent met this challenge in a high-fidelity racing simulator, Gran Tu...

EURASIP, 2024

The whole is greater than the sum of its parts: improving music source separation by bridging networks

Ryosuke Sawata, Naoya Takahashi, Stefan Uhlich*, Shusuke Takahashi*, Yuki Mitsufuji

This paper presents the crossing scheme (X-scheme) for improving the performance of deep neural network (DNN)-based music source separation (MSS) with almost no increasing calculation cost. It consists of three components: (i) multi-domain loss (MDL), (ii) bridging operation...

ICML, 2024

Measure dataset diversity, don’t just claim it

Dora Zhao*, Jerone T. A. Andrews, Orestis Papakyriakopoulos*, Alice Xiang

Machine learning (ML) datasets, often perceived as neutral, inherently encapsulate abstract and disputed social constructs. Dataset curators frequently employ value-laden terms such as diversity, bias, and quality to characterize datasets. Despite their prevalence, these ter...

ICML, 2024

PerceptAnon: Exploring the Human Perception of Image Anonymization Beyond Pseudonymization for GDPR

Kartik Patwari, Chen-Nee Chuah*, Lingjuan Lyu, Vivek Sharma

Current image anonymization techniques, largely focus on localized pseudonymization, typically modify identifiable features like faces or full bodies and evaluate anonymity through metrics such as detection and re-identification rates. However, this approach often overlooks ...

ICML, 2024

COALA: A Practical and Vision-Centric Federated Learning Platform

Weiming Zhuang, Jian Xu, Chen Chen, Jingtao Li, Lingjuan Lyu

We present COALA, a vision-centric Federated Learning (FL) platform, and a suite of benchmarks for practical FL scenarios, which we categorize as task, data, and model levels. At the task level, COALA extends support from simple classification to 15 computer vision tasks, in...

ECCV, 2024

Sparo: Selective Attention for Robust and Compositional Transformer Encodings for Vision

Ankit Vani*, Bac Nguyen, Samuel Lavoie*, Ranjay Krishna*, Aaron Courville*

Selective attention helps us focus on task-relevant aspects in the constant flood of our sensory input. This constraint in our perception allows us to robustly generalize under distractions and to new compositions of perceivable concepts. Transformers employ a similar notion...

ECCV, 2024

SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning

Bac Nguyen, Stefan Uhlich*, Fabien Cardinaux*, Lukas Mauch*, Marzieh Edraki*, Aaron Courville*

Handling distribution shifts from training data, known as out-of-distribution (OOD) generalization, poses a significant challenge in the field of machine learning. While a pre-trained vision-language model like CLIP has demonstrated remarkable zero-shot performance, further ...

SSE, 2024

Analog AI as a Service: A Cloud Platform for In-Memory Computing

Kaoutar El Maghraouir*, Kim Tran*, Kurtis Ruby*, Borja Godoy*, Jordan Murray*, Manuel Le Gallo-Bourdeau*, Todd Deshane*, Pablo Gonzalez*, Diego Moreda*, Hadjer Benmeziane*, Corey Liam Lammie*, Julian Büchel*, Malte J. Rasch, Abu Sebastian*, Vijay Narayanan*

This paper introduces the Analog AI Cloud Composer platform, a service that allows users to access Analog In-Memory Computing (AIMC) simulation and computing resources over the cloud. We introduce the concept of an Analog AI as a Service (AAaaS). AIMC offers a novel approach...

ACL, 2024

On the Language Encoder of Contrastive Cross-modal Models

Mengjie Zhao*, Junya Ono*, Zhi Zhong*, Chieh-Hsin Lai, Yuhta Takida, Naoki Murata, Takashi Shibuya, Hiromi Wakaki*, Yuki Mitsufuji, Wei-Hsiang Liao

Contrastive cross-modal models such as CLIP and CLAP aid various vision-language (VL) and audio-language (AL) tasks. However, there has been limited investigation of and improvement in their language encoder, which is the central component of encoding natural language descri...

ACL, 2024

DiffuCOMET: Contextual Commonsense Knowledge Diffusion

Silin Gao*, Mete Ismayilzada*, Mengjie Zhao*, Hiromi Wakaki*, Yuki Mitsufuji, Antoine Bosselut*

Inferring contextually-relevant and diverse commonsense to understand narratives remains challenging for knowledge models. In this work, we develop a series of knowledge models, DiffuCOMET, that leverage diffusion to learn to reconstruct the implicit semantic connections bet...

ISMIR, 2024

SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond

Marco Comunità*, Zhi Zhong*, Akira Takahashi*, Shiqi Yang*, Mengjie Zhao*, Koichi Saito, Yukara Ikemiya, Takashi Shibuya, Shusuke Takahashi*, Yuki Mitsufuji

Recent advances in generative models that iteratively synthesize audio clips sparked great success to text-to-audio synthesis (TTA), but with the cost of slow synthesis speed and heavy computation. Although there have been attempts to accelerate the iterative procedure, high...

ISMIR, 2024

Towards Assessing Data Replication in Music Generation with Music Similarity Metrics on Raw Audio

Roser Batlle-Roca*, Wei-Hsiang Liao, Xavier Serra, Yuki Mitsufuji, Emilia Gómez*

Recent advancements in music generation are raising multiple concerns about the implications of AI in creative music processes, current business models and impacts related to intellectual property management. A relevant challenge is the potential replication and plagiarism o...

ICML, 2024

A New Linear Scaling Rule for Private Adaptive Hyperparameter Optimization

Ashwinee Panda*, Xinyu Tang*, Vikash Sehwag, Saeed Mahloujifar*, Prateek Mittal*

An open problem in differentially private deep learning is hyperparameter optimization (HPO). DP-SGD introduces new hyperparameters and complicates existing ones, forcing researchers to painstakingly tune hyperparameters with hundreds of trials, which in turn makes it imposs...

ICML, 2024

How to Trace Latent Generative Model Generated Images without Artificial Watermark?

Zhenting Wang, Vikash Sehwag, Chen Chen, Lingjuan Lyu, Dimitris N. Metaxas*, Shiqing Ma*

Latent generative models (e.g., Stable Diffusion) have become more and more popular, but concerns have arisen regarding potential misuse related to images generated by these models. It is, therefore, necessary to analyze the origin of images by inferring if a particular imag...

INTERSPEECH, 2024

SilentCipher: Deep Audio Watermarking

Mayank Kumar Singh*, Naoya Takahashi, Yuki Mitsufuji, Wei-Hsiang Liao

In the realm of audio watermarking, it is challenging to simultaneously encode imperceptible messages while enhancing the message capacity and robustness. Although recent advancements in deep learning-based methods bolster the message capacity and robustness over traditional...

RAL, 2024

BeTAIL: Behavior Transformer Adversarial Imitation Learning from Human Racing Gameplay

Catherine Weaver*, Chen Tang*, Ce Hao*, Kenta Kawamoto, Masayoshi Tomizuka*, Wei Zhan*

Autonomous racing poses a significant challenge for control, requiring planning minimum-time trajectories under uncertain dynamics and controlling vehicles at their handling limits. Current methods requiring hand-designed physical models or reward functions specific to each ...

VLSI, 2024

State-Independent Low Resistance Drift SiSbTe Phase Change Memory for Analog In-Memory Computing Applications

HY Cheng*, Zhi-Lun Liu*, Amlan Majumdar*, Alexander Grun*, Asit Ray*, Jeff Su*, Malte J. Rasch, Fabio Carta*, Lynne Gignac*, Christian Lavoie*, Cheng-Wei Cheng*, M Bright Sky*, HL Lung*

We developed a phase-change memory (PCM), with SiSbTe material, that showed state-independent resistance drift (v~0.04) at 65°C over the entire analog conductance range. We evaluated this PCM for In Memory Compute (IMC) applications simulating the performance of BERT model w...

NAACL, 2024

Isometric Neural Machine Translation using Phoneme Count Ratio Reward-based Reinforcement Learning

Shivam R Mhaskar, Nirmesh Shah*, Mohammadi Zaki, Ashishkumar Gudmalwar, Pankaj Wasnik

Traditional Automatic Video Dubbing (AVD) pipeline consists of three key modules, namely, Automatic Speech Recognition (ASR), Neural Machine Translation (NMT), and Text-to-Speech (TTS). Within AVD pipelines, isometric-NMT algorithms are employed to regulate the length of the...

SAC, 2024

Optimizing Movie Selections: A Multi-Task, Multi-Modal Framework with Strategies for Missing Modality Challenges

Subham Raj*, Pawan Agrawal*, Sriparna Saha*, Brijraj Singh*, Niranjan Pedanekar*

Online recommendation systems have become a crucial feature of Over-the-Top (OTT) platforms, which provide streaming media content over the internet. OTT platforms, such as Netflix, Hulu, and Amazon Prime, use recommendation systems to suggest movies, TV shows, and other con...

DAFX, 2024

SEARCHING FOR MUSIC MIXING GRAPHS: A PRUNING APPROACH

Sungho Lee*, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Stefan Uhlich*, Giorgio Fabbro*, Kyogu Lee*, Yuki Mitsufuji

Music mixing is compositional -- experts combine multiple audio processors to achieve a cohesive mix from dry source tracks. We propose a method to reverse engineer this process from the input and output audio. First, we create a mixing console that applies all available pro...

LREC-COLING, 2024

CookingSense: A Culinary Knowledgebase with Multidisciplinary Assertions

Donghee Choi*, Mogan Gim*, Donghyeon Park*, Mujeen Sung, Hyunjae Kim, Jaewoo Kang*, Jihun Choi

This paper introduces CookingSense, a descriptive collection of knowledge assertions in the culinary domain extracted from various sources, including web data, scientific papers, and recipes, from which knowledge covering a broad range of aspects is acquired. CookingSense is...

ICASSP, 2024

BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network

Takashi Shibuya, Yuhta Takida, Yuki Mitsufuji

Generative adversarial network (GAN)-based vocoders have been intensively studied because they can synthesize high-fidelity audio waveforms faster than real-time. However, it has been reported that most GANs fail to obtain the optimal projection for discriminating between re...

ISCAS, 2024

Improving the Accuracy of Analog-Based In-Memory Computing Accelerators Post-Training

Corey Lammie*, Athanasios Vasilopoulos*, Julian Büchel*, Giacomo Camposampiero*, Manuel Le Gallo*, Malte J. Rasch, Abu Sebastian*

Analog-Based In-Memory Computing (AIMC) inference accelerators can be used to efficiently execute Deep Neural Network (DNN) inference workloads. However, to mitigate accuracy losses, due to circuit and device non-idealities, Hardware-Aware (HWA) training methodologies must b...

INTERSPEECH, 2024

VECL-TTS: Voice identity and Emotional style aware Cross-Lingual TTS

Ashishkumar Gudmalwar, Nirmesh Shah*, Sai Akarsh, Pankaj Wasnik

Despite the significant advancements in Text-to-Speech (TTS) systems, their full utilization in automatic dubbing remains limited. This task necessitates the extraction of voice identity and emotional style from a reference speech in a source language and subsequently transf...

INTERSPEECH, 2024

DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing

Neha Sahipjohn, Ashishkumar Gudmalwar, Nirmesh Shah*, Pankaj Wasnik

Audio-visual alignment after dubbing is a challenging research problem. To this end, we propose a novel method, DubWise Multi-modal Large Language Model (LLM)-based Text-to-Speech (TTS), which can control the speech duration of synthesized speech in such a way that it aligns...

ICRA, 2024

Wait That Feels Familiar: Learning to Extrapolate Human Preferences for Preference-Aligned Path Planning.

Haresh Karnan*, Elvin Yang*, Garrett Warnell*, Joydeep Biswas*, Peter Stone

Autonomous mobility tasks such as lastmile delivery require reasoning about operator indicated preferences over terrains on which the robot should navigate to ensure both robot safety and mission success. However, coping with out of distribution data from novel terrains or a...

COACM, 2024

Now, Later, and Lasting: 10 Priorities for AI Research, Policy, and Practice.

Eric Horvitz*, Vincent Conitzer*, Sheila McIlraith*, Peter Stone

Advances in artificial intelligence (AI) will transform many aspects of our lives and society, bringing immense opportunities but also posing significant risks and challenges. The next several decades may well be a turning point for humanity, comparable to the industrial rev...

AAAI, 2024

Learning Optimal Advantage from Preferences and Mistaking it for Reward.

W. Bradley Knox*, Sigurdur Orn Adalgeirsson*, Serena Booth*, Anca Dragan*, Peter Stone, Scott Niekum*

We consider algorithms for learning reward functions from human preferences over pairs of trajectory segments, as used in reinforcement learning from human feedback (RLHF). Most recent work assumes that human preferences are generated based only upon the reward accrued withi...

AAAI, 2024

Minimum Coverage Sets for Training Robust Ad Hoc Teamwork Agents.

Arrasy Rahman*, Jiaxun Cui*, Peter Stone

Robustly cooperating with unseen agents and human partners presents significant challenges due to the diverse cooperative conventions these partners may adopt. Existing Ad Hoc Teamwork (AHT) methods address this challenge by training an agent with a population of diverse tea...

ICRA, 2024

Rethinking Social Robot Navigation: Leveraging the Best of Two Worlds

Amir Hossain Raj*, Zichao Hu*, Haresh Karnan*, Rohan Chandra*, Amirreza Payandeh*, Luisa Mao*, Peter Stone, Joydeep Biswas*, Xuesu Xiao*

Empowering robots to navigate in a socially compliant manner is essential for the acceptance of robots moving in human-inhabited environments. Previously, roboticists have developed geometric navigation systems with decades of empirical validation to achieve safety and effic...

AR, 2024

The Human in the Loop: Perspectives and Challenges for RoboCup 2050.

Alessandra Rossi*, Maike Paetzel-Prüsmann*, Merel Keijsers*, Michael Anderson*, Susan Leigh Anderson*, Daniel Barry*, Jan Gutsche*, Justin Hart*, Luca Iocchi*, Ainse Kokkelmans*, Wouter Kuijpers*, Yun Liu*, Daniel Polani*, Caleb Roscon*, Marcus Scheunemann*, Peter Stone, Florian Vahl*, René van de Molengraft*, Oskar von Stryk*

Robotics researchers have been focusing on developing autonomous and human-like intelligent robots that are able to plan, navigate, manipulate objects, and interact with humans in both static and dynamic environments. These capabilities, however, are usually developed for di...

TMLR, 2024

HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes

Yuhta Takida, Yukara Ikemiya, Takashi Shibuya, Kazuki Shimada, Woosung Choi, Chieh-Hsin Lai, Naoki Murata, Toshimitsu Uesaka, Kengo Uchida, Yuki Mitsufuji, Wei-Hsiang Liao

Vector quantization (VQ) is a technique to deterministically learn features with discrete codebook representations. It is commonly performed with a variational autoencoding model, VQ-VAE, which can be further extended to hierarchical structures for making high-fidelity recon...

ICASSP, 2024

Enhancing Semantic Communication with Deep Generative Models -- An ICASSP Special Session Overview

Eleonora Grassucci*, Yuki Mitsufuji, Ping Zhang*, Danilo Comminiello*

ICASSP, 2024

Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders

Hao Shi*, Kazuki Shimada, Masato Hirano*, Takashi Shibuya, Yuichiro Koyama*, Zhi Zhong*, Shusuke Takahashi*, Tatsuya Kawahara*, Yuki Mitsufuji

Diffusion-based speech enhancement (SE) has been investigated recently, but its decoding is very time-consuming. One solution is to initialize the decoding process with the enhanced feature estimated by a predictive SE system. However, this two-stage method ignores the compl...

ICASSP, 2024

VRDMG: Vocal Restoration via Diffusion Posterior Sampling with Multiple Guidance

Carlos Hernandez-Olivan*, Koichi Saito, Naoki Murata, Chieh-Hsin Lai, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Yuki Mitsufuji

Restoring degraded music signals is essential to enhance audio quality for downstream music manipulation. Recent diffusion-based music restoration methods have demonstrated impressive performance, and among them, diffusion posterior sampling (DPS) stands out given its intrin...

ICASSP, 2024

Zero- and Few-shot Sound Event Localization and Detection

Kazuki Shimada, Kengo Uchida, Yuichiro Koyama*, Takashi Shibuya, Shusuke Takahashi*, Yuki Mitsufuji, Tatsuya Kawahara*

Sound event localization and detection (SELD) systems estimate direction-of-arrival (DOA) and temporal activation for sets of target classes. Neural network (NN)-based SELD systems have performed well in various sets of target classes, but they only output the DOA and tempor...

ICASSP, 2024

Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription

Frank Cwitkowitz*, Kin Wai Cheuk, Woosung Choi, Marco A. Martínez-Ramírez, Keisuke Toyama*, Wei-Hsiang Liao, Yuki Mitsufuji

In recent years, research on music transcription has focused mainly on architecture design and instrument-specific data acquisition. With the lack of availability of diverse datasets, progress is often limited to solo-instrument tasks such as piano transcription. Several wor...

ACC, 2024

Real-time Trajectory Generation via Dynamic Movement Primitives for Autonomous Racing

Catherine Weaver*, Roberto Capobianco, Peter R. Wurman, Peter Stone, Masayoshi Tomizuka*

We employ sequences of high-order motion primitives for efficient online trajectory planning, enabling competitive racecar control even when the car deviates from an offline demonstration. Dynamic Movement Primitives (DMPs) utilize a target-driven non-linear differential equ...

RAL, 2024

Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning

Ce Hao*, Catherine Weaver*, Chen Tang*, Kenta Kawamoto, Masayoshi Tomizuka*, Wei Zhan*

Hierarchical reinforcement learning (RL) can accelerate long-horizon decision-making by temporally abstracting a policy into multiple levels. Promising results in sparse reward environments have been seen with skills , i.e. sequences of primitive actions. Typically, a skill ...

CVPR, 2024

FedMef: Towards Memory-efficient Federated Dynamic Pruning

Hong Huang, Weiming Zhuang, Chen Chen, Lingjuan Lyu

Federated learning (FL) promotes decentralized training while prioritizing data confidentiality. However, its application on resource-constrained devices is challenging due to the high demand for computation and memory resources for training deep learning models. Neural netw...

ICLR, 2024

DIAGNOSIS: Detecting Unauthorized Data Usages in Text-to-image Diffusion Models

Zhenting Wang, Chen Chen, Lingjuan Lyu, Dimitris N. Metaxas*, Shiqing Ma*

Recent text-to-image diffusion models have shown surprising performance in generating high-quality images. However, concerns have arisen regarding the unauthorized data usage during the training or fine-tuning process. One example is when a model trainer collects a set of im...

AAAI, 2024

Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning

Zizhao Wang*, Caroline Wang*, Xuesu Xiao*, Yuke Zhu*, Peter Stone

Two desiderata of reinforcement learning (RL) algorithms are the ability to learn from relatively little experience and the ability to learn policies that generalize to a range of problem specifications. In factored state spaces, one approach towards achieving both goals is ...

AAAI, 2024

Minimum Coverage Sets for Training Robust Ad Hoc Teamwork Agents

Arrasy Rahman*, Jiaxun Cui*, Peter Stone

AAAI, 2024

Learning Optimal Advantage from Preferences and Mistaking it for Reward

W. Bradley Knox*, Stephane Hatgis-Kessell*, Sigurdur Orn Adalgeirsson*, Serena Booth*, Anca Dragan*, Peter Stone, Scott Niekum*

We consider algorithms for learning reward functions from human preferences over pairs of trajectory segments---as used in reinforcement learning from human feedback (RLHF)---including those used to fine tune ChatGPT and other contemporary language models. Most recent work o...

AAAI, 2024

Combating Data Imbalances in Federated Semi-supervised Learning with Dual Regulators

Sikai Bai*, Shuaicheng Li*, Weiming Zhuang, Jie Zhang*, Kunlin Yang*, Jun Hou*, Shuai Yi*, Shuai Zhang*, Junyu Gao*

Federated learning has become a popular method to learn from decentralized heterogeneous data. Federated semi-supervised learning (FSSL) emerges to train models from a small fraction of labeled data due to label scarcity on decentralized clients. Existing FSSL methods assume...

RAL, 2024

Collaborative Multi-Object Tracking with Conformal Uncertainty Propagation

Sanbao Su*, Songyang Han, Yiming Li*, Zhili Zhang*, Chen Feng*, Caiwen Ding*, Fei Miao*

Object detection and multiple object tracking (MOT) are essential components of self-driving systems. Accurate detection and uncertainty quantification are both critical for onboard modules, such as perception, prediction, and planning, to improve the safety and robustness o...

TMLR, 2024

What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?

Songyang Han, Sanbao Su*, Sihong He*, Shuo Han*, Haizhao Yang*, Shaofeng Zou*, Fei Miao*

Various methods for Multi-Agent Reinforcement Learning (MARL) have been developed with the assumption that agents' policies are based on accurate state information. However, policies learned through Deep Reinforcement Learning (DRL) are susceptible to adversarial state pertu...

IEEE T-ITS, 2024

A Multi-Agent Reinforcement Learning Approach for Safe and Efficient Behavior Planning of Connected Autonomous Vehicles

Songyang Han, Shanglin Zhou*, Jiangwei Wang*, Lynn Pepin*, Caiwen Ding*, Jie Fu*, Fei Miao*

The recent advancements in wireless technology enable connected autonomous vehicles (CAVs) to gather information about their environment by vehicle-to-vehicle (V2V) communication. In this work, we design an information-sharing-based multi-agent reinforcement learning (MARL) ...

ICLR, 2024

Views Can Be Deceiving: Improved SSL Through Feature Space Augmentation

Kimia Hamidieh*, Haoran Zhang*, Swami Sankaranarayanan, Marzyeh Ghassemi*

Supervised learning methods have been found to exhibit inductive biases favoring simpler features. When such features are spuriously correlated with the label, this can result in suboptimal performance on minority subgroups. Despite the growing popularity of methods which le...

ICLR, 2024

FedWon: Triumphing Multi-domain Federated Learning Without Normalization

Weiming Zhuang, Lingjuan Lyu

Federated learning (FL) enhances data privacy with collaborative in-situ training on decentralized clients. Nevertheless, FL encounters challenges due to non-independent and identically distributed (non-i.i.d) data, leading to potential performance degradation and hindered c...

ICLR, 2024

Detecting, Explaining, and Mitigating Memorization in Diffusion Models

Yuxin Wen, Yuchen Liu*, Chen Chen, Lingjuan Lyu

Recent breakthroughs in diffusion models have exhibited exceptional image-generation capabilities. However, studies show that some outputs are merely replications of training data. Such replications present potential legal challenges for model owners, especially when the gen...

ICLR, 2024

FedP3: Federated Personalized and Privacy-friendly Network Pruning under Model Heterogeneity

Kai Yi, Nidham Gazagnadou, Peter Richtárik*, Lingjuan Lyu

The interest in federated learning has surged in recent research due to its unique ability to train a global model using privacy-secured information held locally on each client. This paper pays particular attention to the issue of client-side model heterogeneity, a pervasive...

ICLR, 2024

Towards Principled Representation Learning from Videos for Reinforcement Learning

Dipendra Misra*, Akanksha Saran, Tengyang Xie*, Alex Lamb*, John Langford*

We study pre-training representations for decision-making using video data, which is abundantly available for tasks such as game agents and software testing. Even though significant empirical advances have been made on this problem, a theoretical understanding remains absent...

ICLR, 2024

SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer

Yuhta Takida, Masaaki Imaizumi*, Takashi Shibuya, Chieh-Hsin Lai, Toshimitsu Uesaka, Naoki Murata, Yuki Mitsufuji

Generative adversarial networks (GANs) learn a target probability distribution by optimizing a generator and a discriminator with minimax objectives. This paper addresses the question of whether such optimization actually provides the generator with gradients that make its d...

ICLR, 2024

Manifold Preserving Guided Diffusion

Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun Kim*, Wei-Hsiang Liao, Yuki Mitsufuji, J. Zico Kolter*, Ruslan Salakhutdinov*, Stefano Ermon*

Despite the recent advancements, conditional image generation still faces challenges of cost, generalizability, and the need for task-specific training. In this paper, we propose Manifold Preserving Guided Diffusion (MPGD), a training-free conditional generation framework th...

ICLR, 2024

Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion

Dongjun Kim*, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, Stefano Ermon*

Consistency Models (CM) (Song et al., 2023) accelerate score-based diffusion model sampling at the cost of sample quality but lack a natural way to trade-off quality for speed. To address this limitation, we propose Consistency Trajectory Model (CTM), a generalization encomp...

FACCT, 2024

Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech Generators

Wiebke Hutiri*, Orestis Papakyriakopoulos*, Alice Xiang

The rapid and wide-scale adoption of AI to generate human speech poses a range of significant ethical and safety risks to society that need to be addressed. For example, a growing number of speech generation incidents are associated with swatting attacks in the United States...

CVPR, 2024

Hearing Anything Anywhere

Mason Long Wang*, Samuel Clarke, Ruohan Gao, Shangzhe Wu, Jiajun Wu

Multimodal representation learning to integrate different modalities, such as text, vision, and audio is important for real-world applications. The symmetric InfoNCE loss proposed in CLIP is a key concept in multimodal representation learning. In this work, we provide a theo...

AAAI, 2024

Asynchronous Task Plan Refinement for Multi-Robot Task and Motion Planning

Yoonchang Sung*, Rahul Shome*, Peter Stone

This paper explores general multi-robot task and motion planning, where multiple robots in close proximity manipulate objects while satisfying constraints and a given goal. In particular, we formulate the plan refinement problem--which, given a task plan, finds valid assignm...

SIGGRAPH ASIA 2023, 2023

MyStyle++: A Controllable Personalized Generative Prior

Libing Zeng*, Lele Chen, Yi Xu*, Nima Kalantari*

In this paper, we propose an approach to obtain a personalized generative prior with explicit control over a set of attributes. We build upon MyStyle, a recently introduced method, that tunes the weights of a pre-trained StyleGAN face generator on a few images of an individu...

NEURIPS, 2023

Ethical Considerations for Responsible Data Curation

Jerone Andrews, Dora Zhao*, William Thong, Apostolos Modas, Orestis Papakyriakopoulos*, Alice Xiang

Human-centric computer vision (HCCV) data curation practices often neglect privacy and bias concerns, leading to dataset retractions and unfair models. HCCV datasets constructed through nonconsensual web scraping lack crucial metadata for comprehensive fairness and robustnes...

ICASSP, 2023

Enhancing Semantic Communication with Deep Generative Models -- An ICASSP Special Session Overview

Eleonora Grassucci*, Yuki Mitsufuji, Ping Zhang*, Danilo Comminiello*

Semantic communication is poised to play a pivotal role in shaping the landscape of future AI-driven communication systems. Its challenge of extracting semantic information from the original complex content and regenerating semantically consistent data at the receiver, possi...

ICASSP, 2023

BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network

Takashi Shibuya, Yuhta Takida, Yuki Mitsufuji

ICASSP, 2023

VRDMG: Vocal Restoration via Diffusion Posterior Sampling with Multiple Guidance

Carlos Hernandez-Olivan*, Koichi Saito, Naoki Murata, Chieh-Hsin Lai, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Yuki Mitsufuji

ICASSP, 2023

Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription

Frank Cwitkowitz*, Kin Wai Cheuk, Woosung Choi, Marco A. Martínez-Ramírez, Keisuke Toyama*, Wei-Hsiang Liao, Yuki Mitsufuji

SIGGRAPH ASIA 2023, 2023

Enhancing Diffusion Models with 3D Perspective Geometry Constraints

Rishi Upadhyay*, Howard Zhang*, Yunhao Ba, Ethan Yang*, Blake Gella*, Sicheng Jiang*, Alex Wong*, Achuta Kadambi*

While perspective is a well-studied topic in art, it is generally taken for granted in images. However, for the recent wave of high-quality image synthesis methods such as latent diffusion models, perspective accuracy is not an explicit requirement. Since these methods are c...

NEURIPS, 2023

Posthoc privacy guarantees for collaborative inference with modified Propose-Test-Release

Abhishek Singh*, Praneeth Vepakomma*, Vivek Sharma, Ramesh Raskar*

Cloud-based machine learning inference is an emerging paradigm where users query by sending their data through a service provider who runs an ML model on that data and returns back the answer. Due to increased concerns over data privacy, recent works have proposed Collaborat...

CMMR, 2023

VaryNote: A Method to Automatically Vary the Number of Notes in Symbolic Music

Juan M. Huerta*, Bo Liu*, Peter Stone

Automatically varying the number of notes in symbolic music has various applications in assisting music creators to embellish simple tunes or to reduce complex music to its core idea. In this paper, we formulate the problem of varying the number of notes while preserving the...

NEURIPS, 2023

STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events

Kazuki Shimada, Archontis Politis*, Parthasaarathy Sudarsanam*, Daniel Krause*, Kengo Uchida, Sharath Adavann*, Aapo Hakala*, Yuichiro Koyama*, Naoya Takahashi, Shusuke Takahashi*, Tuomas Virtanen*, Yuki Mitsufuji

While direction of arrival (DOA) of sound events is generally estimated from multichannel audio data recorded in a microphone array, sound events usually derive from visually perceptible source objects, e.g., sounds of footsteps come from the feet of a walker. This paper pro...

ECAI, 2023

CERM: Context-aware Literature-based Discovery via Sentiment Analysis

Julio Christian Young*, Uchenna Akujuobi

Motivated by the abundance of biomedical publications and the need to better understand the relationship between food and health, we study a new sentiment analysis task based on literature- based discovery. Many attempts have been made to introduce health into recipe recomme...

NEURIPS, 2023

Privacy Assessment on Reconstructed Images: Are Existing Evaluation Metrics Faithful to Human Perception?

Xiaoxiao Sun*, Nidham Gazagnadou, Vivek Sharma, Lingjuan Lyu, Hongdong Li*, Liang Zheng*

Hand-crafted image quality metrics, such as PSNR and SSIM, are commonly used to evaluate model privacy risk under reconstruction attacks. Under these metrics, reconstructed images that are determined to resemble the original one generally indicate more privacy leakage. Image...

NEURIPS, 2023

LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning

Bo Liu*, Yifeng Zhu*, Chongkai Gao*, Yihao Feng*, Qiang Liu*, Yuke Zhu*, Peter Stone

Lifelong learning offers a promising paradigm of building a generalist agent that learns and adapts over its lifespan. Unlike traditional lifelong learning problems in image and text domains, which primarily involve the transfer of declarative knowledge of entities and conce...

NEURIPS, 2023

FAMO: Fast Adaptive Multitask Optimization

Bo Liu*, Yihao Feng*, Peter Stone, Qiang Liu*

One of the grand enduring goals of AI is to create generalist agents that can learn multiple different tasks from diverse data via multitask learning (MTL). However, gradient descent (GD) on the average loss across all tasks may yield poor multitask performance due to severe...

NEURIPS, 2023

Elden: Exploration via Local Dependencies

Zizhao Wang*, Jiaheng Hu*, Roberto Martin-Martin*, Peter Stone

Tasks with large state space and sparse reward present a longstanding challenge to reinforcement learning. In these tasks, an agent needs to explore the state space efficiently until it finds reward: the hard exploration problem. To deal with this problem, the community has ...

NEURIPS, 2023

f-Policy Gradients: A General Framework for Goal-Conditioned RL using f-Divergences

Siddhant Agarwal*, Ishan Durugkar, Amy Zhang*, Peter Stone

Goal-Conditioned RL problems provide sparse rewards where the agent receives a reward signal only when it has achieved the goal, making exploration a difficult problem. Several works augment this sparse reward with a learned dense reward function, but this can lead to subopt...

NEURIPS, 2023

Differentially Private Image Classification by Learning Priors from Random Processes

Xinyu Tang*, Ashwinee Panda*, Vikash Sehwag, Prateek Mittal*

In privacy-preserving machine learning, differentially private stochastic gradient descent (DP-SGD) performs worse than SGD due to per-sample gradient clipping and noise addition.A recent focus in private learning research is improving the performance of DP-SGD on private da...

NEURIPS, 2023

UltraRE: Enhancing RecEraser for Recommendation Unlearning via Error Decomposition

Yuyuan Li*, Chaochao Chen*, Yizhao Zhang*, Weiming Liu*, Lingjuan Lyu, Xiaolin Zheng*, Dan Meng*, Jun Wang*

With growing concerns regarding privacy in machine learning models, regulations have committed to granting individuals the right to be forgotten while mandating companies to develop non-discriminatory machine learning systems, thereby fueling the study of the machine unlearn...

NEURIPS, 2023

Towards Personalized Federated Learning via Heterogeneous Model Reassembly

Jiaqi Wang*, Xingyi Yang*, Suhan Cui*, Liwei Che*, Lingjuan Lyu, Dongkuan Xu*, Fenglong Ma*

This paper focuses on addressing the practical yet challenging problem of model heterogeneity in federated learning, where clients possess models with different network structures. To track this problem, we propose a novel framework called pFedHR, which leverages heterogeneo...

NEURIPS, 2023

Is Heterogeneity Notorious? Taming Heterogeneity to Handle Test-Time Shift in Federated Learning

Yue Tan, Chen Chen, Weiming Zhuang, Xin Dong, Lingjuan Lyu, Guodong Long*

Federated learning (FL) is an effective machine learning paradigm where multiple clients can train models based on heterogeneous data in a decentralized manner without accessing their private data. However, existing FL systems undergo performance deterioration due to feature...

NEURIPS, 2023

Where Did I Come From? Origin Attribution of AI-Generated Images

Zhenting Wang, Chen Chen, Yi Zeng, Lingjuan Lyu, Shiqing Ma*

Image generation techniques have been gaining increasing attention recently, but concerns have been raised about the potential misuse and intellectual property (IP) infringement associated with image generation models. It is, therefore, necessary to analyze the origin of ima...

NEURIPS, 2023

Towards a fuller understanding of neurons with Clustered Compositional Explanations

Biagio La Rosa*, Leilani H. Gilpin*, Roberto Capobianco

Compositional Explanations is a method for identifying logical formulas of concepts that approximate the neurons' behavior. However, these explanations are linked to the small spectrum of neuron activations used to check the alignment (i.e., the highest ones), thus lacking c...

NEURIPS, 2023

Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors

Swami Sankaranarayanan, Thomas Hartvigsen*, Hamid Palangi*, Yoon Kim*, Marzyeh Ghassemi*

Deployed models decay over time due to shifting inputs, changing user needs, or emergent knowledge gaps. When harmful behaviors are identified, targeted edits are required. However, current model editors, which adjust specific behaviors of pre-trained models, degrade model p...

ISMIR, 2023

Automatic Piano Transcription with Hierarchical Frequency-Time Transformer

Keisuke Toyama*, Taketo Akama*, Yukara Ikemiya, Yuhta Takida, Wei-Hsiang Liao, Yuki Mitsufuji

Taking long-term spectral and temporal dependencies into account is essential for automatic piano transcription. This is especially helpful when determining the precise onset and offset for each note in the polyphonic piano content. In this case, we may rely on the capabilit...

IEEE MSLP, 2023

Memory Replay For Continual Learning With Spiking Neural Networks

Michela Proietti*, Alessio Ragno*, Roberto Capobianco

Two of the most impressive features of biological neural networks are their high energy efficiency and their ability to continuously adapt to varying inputs. On the contrary, the amount of power required to train top-performing deep learning models rises as they become more ...

MACHINE LEARNING, 2023

Explainable AI in drug discovery: self-interpretable graph neural network for molecular property prediction using concept whitening

Michela Proietti*, Alessio Ragno*, Biagio La Rosa*, Rino Ragno*, Roberto Capobianco

Molecular property prediction is a fundamental task in the field of drug discovery. Several works use graph neural networks to leverage molecular graph representations. Although they have been successfully applied in a variety of applications, their decision process is not t...

AIXIA, 2023

Understanding Deep RL agent decisions: a novel interpretable approach with trainable prototypes

Caterina Borzillo*, Alessio Ragno*, Roberto Capobianco

Deep reinforcement learning (DRL) models have shown great promise in various applications, but their practical adoption in critical domains is limited due to their opaque decision-making processes. To address this challenge, explainable AI (XAI) techniques aim to enhance tra...

NEURIPS, 2023

FRUNI and FTREE synthetic knowledge graphs for evaluating explainability

Pablo Sanchez Martin, Tarek Besold, Priyadarshini Kumari

Research on knowledge graph completion (KGC)---i.e., link prediction within incomplete KGs---is witnessing significant growth in popularity. Recently, KGC using KG embedding (KGE) models, primarily based on complex architectures (e.g., transformers), have achieved remarkable...

WASPAA, 2023

Extending Audio Masked Autoencoders Toward Audio Restoration

Zhi Zhong*, Hao Shi*, Masato Hirano*, Kazuki Shimada, Kazuya Tateishi*, Takashi Shibuya, Shusuke Takahashi*, Yuki Mitsufuji

Audio classification and restoration are among major downstream tasks in audio signal processing. However, restoration derives less of a benefit from pretrained models compared to the overwhelming success of pretrained models in classification tasks. Due to such unbalanced b...

ICIP, 2023

Query by Activity Video in the Wild

Tao Hu*, William Thong, Pascal Mettes*, Cees Snoek*

This paper considers retrieval of videos containing human activity from just a video query. In the literature, a common assumption is that all activities have sufficient labelled examples when learning an embedding for retrieval. However, this assumption does not hold in pra...

ICCV, 2023

MAS: Towards Resource-Efficient Federated Multiple-Task Learning

Weiming Zhuang, Yonggang Wen*, Shuai Zhang*, Lingjuan Lyu

Federated learning (FL) is an emerging distributed machine learning method that empowers in-situ model training on decentralized edge devices. However, multiple simultaneous FL tasks could overload resource-constrained devices. In this work, we propose the first FL system to...

ICCV, 2023

The Perils of Learning From Unlabeled Data: Backdoor Attacks on Semi-supervised Learning

Virat Shejwalkar, Lingjuan Lyu, Amir Houmansadr*

Semi-supervised machine learning (SSL) is gaining popularity as it reduces the cost of training ML models. It does so by using very small amounts of (expensive, well-inspected) labeled data and large amounts of (cheap, non-inspected) unlabeled data. SSL has shown comparable ...

ICCV, 2023

TARGET: Federated Class-Continual Learning via Exemplar-Free Distillation

Jie Zhang*, Chen Chen, Weiming Zhuang, Lingjuan Lyu

This paper focuses on an under-explored yet important problem: Federated Class-Continual Learning (FCCL), where new classes are dynamically added in federated learning. Existing FCCL works suffer from various limitations, such as requiring additional datasets or storing the ...

ICCV, 2023

Spatio-Temporal Convolution-Attention Video Network

Ali Diba*, Vivek Sharma, Mohammad.M Arzani*, Luc Van Gool*

In this paper, we present a hierarchical neural network based on convolutional and attention modeling for short and long-range video reasoning, called Spatio-Temporal Convolution-Attention Video Network (STCA). The proposed method is capable of learning appearance and tempor...

IROS, 2023

A Novel Control Law for Multi-joint Human-Robot Interaction Tasks While Maintaining Postural Coordination

Keya Ghonasgi*, Reuth Mirsky*, Adrian M Haith*, Peter Stone, Ashish D Deshpande*

Exoskeleton robots are capable of safe torque-controlled interactions with a wearer while moving their limbs through pre-defined trajectories. However, affecting and assisting the wearer's movements while incorporating their inputs (effort and movements) effectively during a...

IROS, 2023

Symbolic State Space Optimization for Long Horizon Mobile Manipulation Planning

Xiaohan Zhang*, Yifeng Zhu*, Yan Ding*, Yuqian Jiang*, Yuke Zhu*, Peter Stone, Shiqi Zhang*

In existing task and motion planning (TAMP) research, it is a common assumption that experts manually specify the state space for task-level planning. A well-developed state space enables the desirable distribution of limited computational resources between task planning and...

ICCV, 2023

NeuRBF: A Neural Fields Representation with Adaptive Radial Basis Functions

Zhang Chen*, Zhong Li*, Liangchen Song*, Lele Chen, Jingyi Yu*, Yi Xu*

We present a novel type of neural fields that uses general radial bases for signal representation. State-of-the-art neural fields typically rely on grid-based representations for storing local neural features and N-dimensional linear kernels for interpolating features at con...

ICCV, 2023

Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting

Wentao Bao*, Lele Chen, Libing Zeng*, Zhong Li*, Yi Xu*, Junsong Yuan*, Yu Kong*

Hand trajectory forecasting from egocentric views is crucial for enabling a prompt understanding of human intentions when interacting with AR/VR systems. However, existing methods handle this problem in a 2D image space which is inadequate for 3D real-world applications. In ...

ICCV, 2023

Beyond Skin Tone: A Multidimensional Measure of Apparent Skin Color

William Thong, Przemyslaw Joniak*, Alice Xiang

This paper strives to measure apparent skin color in computer vision, beyond a unidimensional scale on skin tone. In their seminal paper Gender Shades, Buolamwini and Gebru have shown how gender classification systems can be biased against women with darker skin tones. While...

COLLAS, 2023

Event Tables for Efficient Experience Replay

Varun Kompella, Thomas Walsh, Samuel Barrett, Peter R. Wurman, Peter Stone

Experience replay (ER) is a crucial component of many deep reinforcement learning (RL) systems. However, uniform sampling from an ER buffer can lead to slow convergence and unstable asymptotic behaviors. This paper introduces Stratified Sampling from Event Tables (SSET), whi...

INTERSPEECH, 2023

Iteratively Improving Speech Recognition and Voice Conversion

Mayank Kumar Singh*, Naoya Takahashi, Onoe Naoyuki*

Many existing works on voice conversion (VC) tasks use automatic speech recognition (ASR) models for ensuring linguistic consistency between source and converted samples. However, for the low-data resource domains, training a high-quality ASR remains to be a challenging task...

IJCAI, 2023

BRExIt: On Opponent Modelling in Expert Iteration

Daniel Hernandez, Hendrik Baier*, Michael Kaisers*

Finding a best response policy is a central objective in game theory and multi-agent learning, with modern population-based training approaches employing reinforcement learning algorithms as best response oracles to improve play against candidate opponents (typically previou...

IJCAI, 2023

A Pathway Towards Responsible AI Generated Content

Lingjuan Lyu

AI Generated Content (AIGC) has received tremendous attention within the past few years, with content ranging from image, text, to audio, video, etc. Meanwhile, AIGC has become a double-edged sword and recently received much criticism regarding its responsible usage. In this...

IJCAI, 2023

RAIN: RegulArization on Input and Network for Black-Box Domain Adaptation

Qucheng Peng*, Zhengming Ding*, Lingjuan Lyu, Lichao Sun*, Chen Chen

Source-Free domain adaptation transits the source-trained model towards target domain without exposing the source data, trying to dispel these concerns about data privacy and security. However, this paradigm is still at risk of data leakage due to adversarial attacks on the ...

IJCAI, 2023

FedSampling: A Better Sampling Strategy for Federated Learning

Tao Qi*, Fangzhao Wu*, Lingjuan Lyu, Yongfeng Huang*, Xing Xie*

Federated learning (FL) is an important technique for learning models from decentralized data in a privacy-preserving way. Existing FL methods usually uniformly sample clients for local model learning in each round. However, different clients may have significantly different...

IJCAI, 2023

Reducing Communication for Split Learning by Randomized Top-k Sparsification

Fei Zheng*, Chaochao Chen*, Lingjuan Lyu, Binhui Yao*

The EU AI Act proposal addresses, among other applications, AI systems that enable facial classification and emotion recognition. As part of previous work, we have investigated how citizens deliberate about the validity of AI-based facial classifications in the advertisement...

AIES, 2023

Flickr Africa: Examining Geo-Diversity in Large-Scale, Human-Centric Visual Data

Keziah Naggita*, Julienne LaChance, Alice Xiang

Biases in large-scale image datasets are known to influence the performance of computer vision models as a function of geographic context. To investigate the limitations of standard Internet data collection methods in low- and middle-income countries, we analyze human-centri...

USENIX SECURITY, 2023

Meta-Sift: How to Sift Out a Clean Subset in the Presence of Data Poisoning?

Yi Zeng, Minzhou Pan*, Himanshu Jahagirdar*, Lingjuan Lyu, Ruoxi Jia*

External data sources are increasingly being used to train machine learning (ML) models as the data demand increases. However, the integration of external data into training poses data poisoning risks, where malicious providers manipulate their data to compromise the utility...

KDD, 2023

PrivateRec: Differentially Private Model Training and Online Serving for Federated News Recommendation.

Ruixuan Liu*, Yanlin Wang*, Yang Cao*, Lingjuan Lyu, Weike Pan*, Yun Chen*, Hong Chen*

Collecting and training over sensitive personal data raise severe privacy concerns in personalized recommendation systems, and federated learning can potentially alleviate the problem by training models over decentralized user data.However, a theoretically private solution i...

UAI, 2023

Composing Efficient, Robust Tests for Policy Selection

Dustin Morrill, Thomas Walsh, Daniel Hernandez, Peter R. Wurman, Peter Stone

Modern reinforcement learning systems produce many high-quality policies throughout the learning process. However, to choose which policy to actually deploy in the real world, they must be tested under an intractable number of environmental conditions. We introduce RPOSST, a...

COLLAS, 2023

Model-Based Meta Automatic Curriculum Learning.

Zifan Xu*, Yulin Zhang*, Shahaf S. Shperberg*, Reuth Mirsky*, Yuqian Jiang*, Bo Liu*, Peter Stone

Curriculum learning (CL) has been widely explored to facilitate the learning of hard-exploration tasks in reinforcement learning (RL) by training a sequence of easier tasks, often called a curriculum. While most curricula are built either manually or automatically based on h...

KR, 2023

Grounding LTLf Specifcations in Image Sequences

Elena Umili*, Roberto Capobianco, Giuseppe De Giacomo*

A critical challenge in neuro-symbolic (NeSy) approaches is to handle the symbol grounding problem without direct supervision. That is mapping high-dimensional raw data into an interpretation over a finite set of abstract concepts with a known meaning, without using labels. ...

IEEE TASLP, 2023

The Whole Is Greater than the Sum of Its Parts: Improving DNN-based Music Source Separation

Naoya Takahashi, Stefan Uhlich*, Shusuke Takahashi*, Yuki Mitsufuji

This paper presents the crossing scheme (X-scheme) for improving the performance of deep neural network (DNN)-based music source separation (MSS) without increasing calculation cost. It consists of three components: (i) multi-domain loss (MDL), (ii) bridging operation, which...

INTERSPEECH, 2023

Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement

Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Takashi Shibuya, Shusuke Takahashi*, Yuki Mitsufuji

Although deep neural network (DNN)-based speech enhancement (SE) methods outperform the previous non-DNN-based ones, they often degrade the perceptual quality of generated outputs. To tackle this problem, we introduce a DNN-based generative refiner, Diffiner, aiming to impro...

ICCP, 2023

Learning to Synthesize Photorealistic Dual-pixel Images from RGBD frames

Feiran Li, Heng Guo*, Hiroaki Santo*, Fumio Okura*, Yasuyuki Matsushita*

Recent advances in data-driven dual-pixel (DP) research are bottlenecked by the difficulties in reaching large-scale DP datasets, and a photorealistic image synthesis approach appears to be a credible solution. To benchmark the accuracy of various existing DP image simulator...

ICML, 2023

Byzantine-Robust Learning on Heterogeneous Data via Gradient Splitting

Yuchen Liu*, Chen Chen, Lingjuan Lyu, Fangzhao Wu*, Sai Wu*, Gang Chen*

Federated learning has exhibited vulnerabilities to Byzantine attacks, where the Byzantine attackers can send arbitrary gradients to the central server to destroy the convergence and performance of the global model. A wealth of defenses have been proposed to defend against B...

ICML, 2023

Revisiting Data-Free Knowledge Distillation with Poisoned Teachers

Junyuan Hong, Yi Zeng, Shuyang Yu*, Lingjuan Lyu, Ruoxi Jia*, Jiayu Zhou*

Data-free knowledge distillation (KD) helps realistically transfer knowledge from a pre-trained model (known as the teacher model) to a smaller model (known as the student model) without access to the original training data used for training the teacher model. However, the s...

ICML, 2023

GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration

Naoki Murata, Koichi Saito, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Yuki Mitsufuji

Pre-trained diffusion models have been successfully used as priors in a variety of linear inverse problems, where the goal is to reconstruct a signal from noisy linear measurements. However, existing approaches require knowledge of the linear operator. In this paper, we prop...

ICML, 2023

Men Also Do Laundry: Multi-Attribute Bias Amplification

Dora Zhao*, Jerone T. A. Andrews, Alice Xiang

As computer vision systems become more widely deployed, there is increasing concern from both the research community and the public that these systems are not only reproducing but amplifying harmful social biases. The phenomenon of bias amplification, which is the focus of t...

ICML, 2023

Dimension-independent Certified Neural Network Watermarks via Mollifier Smoothing

Jiaxiang Ren*, Yang Zhou*, Jiayin Jin*, Lingjuan Lyu, Da Yan*

Certified_Watermarks is the first to provide a watermark certificate against 𝑙2-norm watermark removal attacks, by leveraging the randomized smoothing techniques for certified robustness to adversarial attacks. However, the randomized smoothing techniques suffer from hardnes...

ICML, 2023

Fast Federated Machine Unlearning with Nonlinear Functional Theory

Tianshi Che*, Yang Zhou*, Zijie Zhang*, Lingjuan Lyu, Ji Liu*, Da Yan*, Dejing Dou*, Jun Huan*

Federated machine unlearning (FMU) aims to remove the influence of a specified subset of training data upon request from a trained federated learning model. Despite achieving remarkable performance, existing FMU techniques suffer from inefficiency due to two sequential opera...

ICML, 2023

Reconstructive Neuron Pruning for Backdoor Defense

Yige Li*, Xixiang Lyu*, Xingjun Ma*, Nodens Koren*, Lingjuan Lyu, Bo Li*, Yu-Gang Jiang*

Deep neural networks (DNNs) have been found to be vulnerable to backdoor attacks, raising security concerns about their deployment in mission-critical applications. While existing defense methods have demonstrated promising results, it is still not clear how to effectively r...

ACL, 2023

Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark.

Wenjun Peng*, Jingwei Yi*, Fangzhao Wu*, Shangxi Wu*, Bin Bin Zhu*, Lingjuan Lyu, Binxing Jiao*, Guangzhong Sun*, Xing Xie*

Large language models (LLMs) have demonstrated powerful capabilities in both text understanding and generation. Companies have begun to offer Embedding as a Service (EaaS) based on these LLMs, which can benefit various natural language processing (NLP) tasks for customers. H...

FACCT, 2023

Augmented data sheets for speech datasets and ethical decision-making

Orestis Papakyriakopoulos*, Anna Seo Gyeong Choi*, William Thong, Dora Zhao*, Jerone Andrews, Rebecca Bourke, Alice Xiang, Allison Koenecke*

Human-centric image datasets are critical to the development of computer vision technologies. However, recent investigations have foregrounded significant ethical issues related to privacy and bias, which have resulted in the complete retraction, or modification, of several ...

ICML, 2023

Model-based Reinforcement Learning with Scalable Composite Policy Gradient Estimators

Paavo Parmas*, Takuma Seno, Yuma Aoki*

In model-based reinforcement learning (MBRL), policy gradients can be estimated either by derivative-free RL methods, such as likelihood ratio gradients (LR), or by backpropagating through a differentiable model via reparameterization gradients (RP). Instead of using one or ...

NESY, 2023

What's Wrong with Gradient-based Complex Query Answering?

Ouns El Harzli, Samy Badreddine, Tarek Besold

Multi-hop query answering on knowledge graphs is known to be a challenging computational task. Neurosymbolic approaches using neural link predictors have shown promising results but are still outperformed by combinatorial optimization methods on several benchmarks, including...

SCIENCE, 2023

Improving Artificial Intelligence with Games

Peter R. Wurman, Peter Stone, Michael Spranger

Games continue to drive progress in the development of artificial intelligence.

RSS, 2023

Causal Policy Gradient for Whole-Body Mobile Manipulation

Jiaheng Hu*, Peter Stone, Roberto Martin-Martin*

Developing the next generation of household robot helpers requires combining locomotion and interaction capabilities, which is generally referred to as mobile manipulation (MoMa). MoMa tasks are difficult due to the large action space of the robot and the common multi-objec...

TAS, 2023

"What's That Robot Doing Here?": Factors Influencing Perceptions Of Incidental Encounters With Autonomous Quadruped Robots.

Elliott Hauser*, Yao-Cheng Chan*, Geethika Hemkumar*, Daksh Dua*, Parth Chonkar*, Efren Mendoza Enriquez*, Tiffany Kao*, Shikhar Gupta*, Huihai Wang*, Justin Hart*, Reuth Mirsky*, Joydeep Biswas*, Junfeng Jiao*, Peter Stone

Autonomous service robots in a public setting will generate hundreds of incidental human-robot encounters, yet researchers have only recently addressed this important topic in earnest. In this study, we hypothesized that visual indicators of human control, such as a leash on...

ICAPS, 2023

Task Phasing: Automated Curriculum Learning from Demonstrations

Vaibhav Bajaj*, Guni Sharon*, Peter Stone

Applying reinforcement learning (RL) to sparse reward domains is notoriously challenging due to insufficient guiding signals. Common RL techniques for addressing such domains include (1) learning from demonstrations and (2) curriculum learning. While these two approaches ha...

NESY, 2023

Visual Reward Machines

Elena Umili*, Francesco Argenziano*, Aymeric Barbin*, Roberto Capobianco

Non-markovian Reinforcement Learning (RL) tasks are extremely hard to solve, because intelligent agents must consider the entire history of state-action pairs to act rationally in the environment. Most works use Linear Temporal Logic (LTL) to specify temporally-extended task...

ICML, 2023

FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation

Chieh-Hsin Lai, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon*

Score-based generative models learn a family of noise-conditional score functions corresponding to the data density perturbed with increasingly large amounts of noise. These perturbed data densities are tied together by the Fokker-Planck equation (FPE), a partial differentia...

EWAF, 2023

A Reflection on How Cross-Cultural Perspectives on the Ethics of Facial Analysis AI Can Inform EU Policymaking

Chiara Ullstein*, Severin Engelmann*, Orestis Papakyriakopoulos*, Jens Grossklags*

ICASSP, 2023

Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects

Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Stefan Uhlich*, Kyogu Lee*, Yuki Mitsufuji

We propose an end-to-end music mixing style transfer system that converts the mixing style of an input multitrack to that of a reference song. This is achieved with an encoder pre-trained with a contrastive objective to extract only audio effects related information from a r...

ICASSP, 2023

Nonparallel Emotional Voice Conversion for unseen speaker-emotion pairs using dual domain adversarial network Virtual Domain Pairing

Nirmesh Shah*, Mayank Kumar Singh*, Naoya Takahashi, Naoyuki Onoe*

Primary goal of an emotional voice conversion (EVC) system is to convert the emotion of a given speech signal from one style to another style without modifying the linguistic content of the signal. Most of the state-of-the-art approaches convert emotions for seen speaker-emo...

ICASSP, 2023

DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability

Kin Wai Cheuk, Toshimitsu Uesaka, Naoki Murata, Naoya Takahashi, Shusuke Takahashi*, Dorien Herremans*, Yuki Mitsufuji

In this paper we propose a novel generative approach, DiffRoll, to tackle automatic music transcription (AMT).Instead of treating AMT as a discriminative task in which the model is trained to convert spectrograms into piano rolls, we think of it as a conditional generative t...

ICASSP, 2023

Hierarchical Diffusion Models for Singing Voice Neural Vocoder

Naoya Takahashi, Mayank Kumar Singh*, Yuki Mitsufuji

Recent progress in deep generative models has improved the quality of neural vocoders in speech domain. However, generating a high-quality singing voice remains challenging due to a wider variety of musical expressions in pitch, loudness, and pronunciations. In this work, we...

ICASSP, 2023

Towards Adversarially Robust Continual Learning

Tao Bai, Chen Chen, Lingjuan Lyu, Jun Zhao*, Bihan Wen*

Recent studies show that models trained by continual learning can achieve the comparable performances as the standard supervised learning and the learning flexibility of continual learning models enables their wide applications in the real world. Deep learning models, howeve...

ICASSP, 2023

Unsupervised vocal dereverberation with diffusion-based generative models

Koichi Saito, Naoki Murata, Toshimitsu Uesaka, Chieh-Hsin Lai, Yuhta Takida, Takao Fukui*, Yuki Mitsufuji

Removing reverb from reverberant music is a necessary technique to clean up audio for downstream music manipulations. Reverberation of music contains two categories, natural reverb, and artificial reverb. Artificial reverb has a wider diversity than natural reverb due to its...

SCIENTIFIC REPORTS, 2023

Kinematic coordinations capture learning during human-exoskeleton interaction

Keya Ghonasgi*, Reuth Mirsky*, Nisha Bhargava*, Adrian M Haith*, Peter Stone, Ashish D Deshpande*

Human–exoskeleton interactions have the potential to bring about changes in human behavior for physical rehabilitation or skill augmentation. Despite signifcant advances in the design and control of these robots, their application to human training remains limited. The key o...

ICLR, 2023

MocoSFL: enabling cross-client collaborative self-supervised learning

Jingtao Li, Lingjuan Lyu, Daisuke Iso, Chaitali Chakrabarti*, Michael Spranger

Existing collaborative self-supervised learning (SSL) schemes are not suitable for cross-client applications because of their expensive computation and large local data requirements. To address these issues, we propose MocoSFL, a collaborative SSL framework based on Split Fe...

ICLR, 2023

IDEAL: Query-Eﬀicient Data-Free Learning from Black-Box Models

Jie Zhang*, Chen Chen, Lingjuan Lyu

Knowledge Distillation (KD) is a typical method for training a lightweight student model with the help of a well-trained teacher model. However, most KD methods require access to either the teacher's training data or model parameter, which is unrealistic. To tackle this prob...

AAMAS, 2023

D-Shape: Demonstration-Shaped Reinforcement Learning via Goal Conditioning.

Caroline Wang*, Garrett Warnell*, Peter Stone

While combining imitation learning (IL) and reinforcement learning (RL) is a promising way to address poor sample efficiency in autonomous behavior acquisition, methods that do so typically assume that the requisite behavior demonstrations are provided by an expert that be...

ICLR, 2023

MACTA: A Multi-agent Reinforcement Learning Approach for Cache Timing Attacks and Detection

Jiaxun Cui*, Xiaomeng Yang*, Mulong Luo*, Geunbae Lee*, Peter Stone, Hsien-Hsin S. Lee*, Benjamin Lee*, G. Edward Suh*, Wenjie Xiong*, Yuandong Tian*

Security vulnerabilities in computer systems raise serious concerns as computers process an unprecedented amount of private and sensitive data today. Cachetiming attacks (CTA) pose an important practical threat as they can effectively breach many protection mechanisms in t...

ICLR, 2023

CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos

Hao-Wen Dong*, Naoya Takahashi, Yuki Mitsufuji, Julian McAuley*, Taylor Berg-Kirkpatrick*

Recent years have seen progress beyond domain-specific sound separation for speech or music towards universal sound separation for arbitrary sounds. Prior work on universal sound separation has investigated separating a target sound out of an audio mixture given a text query...

ICLR, 2023

Twofer: Tackling Continual Domain Shift with Simultaneous Domain Generalization and Adaptation

Chenxi Liu*, Lixu Wang, Lingjuan Lyu, Chen Sun*, Xiao Wang*, Qi Zhu*

In real-world applications, deep learning models often run in non-stationary environments where the target data distribution continually shifts over time. There have been numerous domain adaptation (DA) methods in both online and offline modes to improve cross-domain adaptat...

ICLR, 2023

MECTA: Memory-Economic Continual Test-Time Model Adaptation

Junyuan Hong, Lingjuan Lyu, Jiayu Zhou*, Michael Spranger

Continual Test-time Adaptation (CTA) is a promising art to secure accuracy gains in continually-changing environments. The state-of-the-art adaptations improve out-of-distribution model accuracy via computation-efficient online test-time gradient descents but meanwhile cost ...

ICRA, 2023

Learning Perceptual Hallucination for Multi-Robot Navigation in Narrow Hallways

Jin-Soo Park*, Xuesu Xiao*, Garrett Warnell*, Harel Yedidsion*, Peter Stone

While current systems for autonomous robot navigation can produce safe and efficient motion plans in static environments, they usually generate suboptimal behaviors when multiple robots must navigate together in confined spaces. For example, when two robots meet each other i...

ICRA, 2023

Benchmarking Reinforcement Learning Techniques for Autonomous Navigation

Zifan Xu*, Bo Liu*, Xuesu Xiao*, Anirudh Nair*, Peter Stone

Deep reinforcement learning (RL) has broughtmany successes for autonomous robot navigation. However,there still exists important limitations that prevent real-worlduse of RL-based navigation systems. For example, most learningapproaches lack safety guarantees; and learned na...

ICLR, 2023

A View From Somewhere: Human-Centric Face Representations

Jerone T. A. Andrews, Przemyslaw Joniak*, Alice Xiang

Few datasets contain self-identified sensitive attributes, inferring attributes risks introducing additional biases, and collecting attributes can carry legal risks. Besides, categorical labels can fail to reflect the continuous nature of human phenotypic diversity, making i...

ICLR, 2023

Towards Robustness Certification Against Universal Perturbations

Yi Zeng, Zhouxing Shi*, Ming Jin*, Feiyang Kang*, Lingjuan Lyu, Cho-Jui Hsieh*, Ruoxi Jia*

In this paper, we investigate the problem of certifying neural network robustness against universal perturbations (UPs), which have been widely used in universal adversarial attacks and backdoor attacks. Existing robustness certification methods aim to provide robustness gua...

WWW, 2023

Minimum Topology Attacks for Graph Neural Networks

Mengmei Zhang*, Xiao Wang*, Chuan Shi*, Lingjuan Lyu, Tianchi Yang*, Junping Du*

With the great popularity of Graph Neural Networks (GNNs), their robustness to adversarial topology attacks has received increasing attention. Although many attack methods have been proposed, they mainly focus on fixed-budget attacks, aiming at finding the most adversarial p...

JAIR, 2023

An Overview of Environmental Features that Impact Deep Reinforcement Learning in Sparse-Reward Domains

Jim Martin Catacora Ocaña*, Roberto Capobianco, Daniele Nardi*

Deep reinforcement learning has achieved impressive results in recent years; yet, it is still severely troubled by environments showcasing sparse rewards. On top of that, not all sparse-reward environments are created equal, ie, they can differ in the presence or absence of ...

ICDE, 2023

T50: T-PAIR: Temporal Node-pair Embedding for Automatic Biomedical Hypothesis Generation (Extended abstract)

Uchenna Akujuobi, Michael Spranger, Sucheendra Palaniappan*, Xiangliang Zhang*

In this paper, we study an automatic hypothesis generation (HG) problem, which refers to the discovery of meaningful implicit connections between scientific terms, including but not limited to diseases, chemicals, drugs, and genes extracted from databases of biomedical publi...

APIN, 2023

A self-interpretable module for deep image classification on small data

Jim Martin Catacora Ocaña*, Roberto Capobianco, Daniele Nardi*

Deep neural networks are the driving force of the recent explosion of machine learning applications in everyday life. However, they usually require a lot of training data to work well, and they act as black-boxes, making predictions without any explanation about them. This p...

CHI, 2023

Upvotes? Downvotes? No Votes? Understanding the relationship between reaction mechanisms and political discourse on Reddit

Orestis Papakyriakopoulos*, Severin Engelmann*, Amy Winecoff*

A significant share of political discourse occurs online on social media platforms. Policymakers and researchers try to understand the role of social media design in shaping the quality of political discourse around the globe. In the past decades, scholarship on political di...

HARVARD JOURNAL OF LAW & TECHNOLOGY, 2023

Being 'Seen' vs. 'Mis-Seen': Tensions between Privacy and Fairness in Computer Vision

Alice Xiang

The rise of facial recognition and related computer vision technologies has been met with growing anxiety over the potential for artificial intelligence (“AI”) to create mass surveillance systems and further entrench societal biases. These concerns have led to calls for grea...

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023

Machine Learning Security in Industry: A Quantitative Survey

L. Bieringer*, K. Grosse*, Tarek Besold, B. Biggio*, K. Krombholz*

Despite the large body of academic work on machine learning security, little is known about the occurrence of attacks on machine learning systems in the wild. In this paper, we report on a quantitative study with 139 industrial practitioners. We analyze attack occurrence and...

NEURAL NETWORKS, 2023

A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems

Megan M. Baker*, Alexander New*, Mario Aguilar-Simon*, Ziad Al-Halah*, Sébastien M. R. Arnold*, Ese Ben-Iwhiwhu*, Andrew P. Brna*, Ethan Brooks*, Ryan C. Brown*, Zachary Daniels*, Anurag Daram*, Fabien Delattre*, Ryan Dellana*, Eric Eaton*, Haotian Fu*, Kristen Grauman*, Jesse Hostetler*, Shariq Iqbal*, Cassandra Kent*, Nicholas Ketz*, Soheil Kolouri*, George Konidaris*, Dhireesha Kudithipudi*, Seungwon Lee*, Michael L. Littman*, Sandeep Madireddy*, Jorge A. Mendez*, Eric Q. Nguyen*, Christine D. Piatko*, Praveen K. Pilly*, Aswin Raghavan*, Abrar Rahman*, Santhosh Kumar Ramakrishnan*, Neale Ratzlaff*, Andrea Soltoggio*, Peter Stone, Indranil Sur*, Zhipeng Tang*, Saket Tiwari*, Kyle Vedder*, Felix Wang*, Zifan Xu*, Angel Yanguas-Gil*, Harel Yedidsion*, Shangqun Yu*, Gautam K. Vallabha*

Despite the advancement of machine learning techniques in recent years, state-of-the-art systems lack robustness to “real world” events, where the input distributions and tasks encountered by the deployed systems will not be limited to the original training context, and syst...

ARTIFICIAL INTELLIGENCE, 2023

Reward (Mis)design for autonomous driving

W. Bradley Knox*, Alessandro Allievi*, Holger Banzhaf*, Felix Schmitt*, Peter Stone

This article considers the problem of diagnosing certain common errors in reward design. Its insights are also applicable to the design of cost functions and performance metrics more generally. To diagnose common errors, we develop 8 simple sanity checks for identifying flaw...

AAAI, 2023

The Perils of Trial-and-Error Reward Design: Misdesign through Overfitting and Invalid Task Specifications

Serena Booth*, W. Bradley Knox*, Julie Shah*, Scott Niekum*, Peter Stone, Alessandro Allievi*

In reinforcement learning (RL), a reward function that aligns exactly with a task's true performance metric is often sparse. For example, a true task metric might encode a reward of 1 upon success and 0 otherwise. These sparse task metrics can be hard to learn from, so in pr...

AAAI, 2023

Metric Residual Networks for Sample Efficient Goal-Conditioned Reinforcement Learning

Bo Liu*, Yihao Feng*, Qiang Liu*, Peter Stone

Goal-conditioned reinforcement learning (GCRL) has a wide range of potential real-world applications, including manipulation and navigation problems in robotics. Especially in such robotics tasks, sample efficiency is of the utmost importance for GCRL since, by default, the ...

AAAI, 2023

Defending Against Backdoor Attacks in Natural Language Generation

Xiaofei Sun*, Xiaoya Li*, Yuxian Meng*, Xiang Ao*, Lingjuan Lyu, Jiwei Li*, Tianwei Zhang*

The frustratingly fragile nature of neural network models make current natural language generation (NLG) systems prone to backdoor attacks and generate malicious sequences that could be sexist or offensive. Unfortunately, little effort has been invested to how backdoor attac...

AAAI, 2023

Delving into the Adversarial Robustness of Federated Learning

Zijie Zhang*, Bo Li*, Chen Chen, Lingjuan Lyu, Shuang Wu*, Shouhong Ding*, Chao Wu*

In Federated Learning (FL), models are as fragile as centrally trained models against adversarial examples. However, the adversarial robustness of federated learning remains largely unexplored. This paper casts light on the challenge of adversarial robustness of federated le...

WSDM, 2023

Considerations for Ethical Speech Recognition Datasets

Orestis Papakyriakopoulos*, Alice Xiang

Speech AI Technologies are largely trained on publicly available datasets or by the massive web-crawling of speech. In both cases, data acquisition focuses on minimizing collection effort, without necessarily taking the data subjects’ protection or user needs into considerat...

AUTONOMOUS ROBOTS, 2023

Multimodal Embodied Attribute Learning by Robots for Object-Centric Action Policies.

Xiaohan Zhang*, Saeid Amiri*, Jivko Sinapov*, Jesse Thomason*, Peter Stone, Shiqi Zhang*

Robots frequently need to perceive object attributes, such as red, heavy, and empty, using multimodal exploratory behaviors, such as look, lift, and shake. One possible way for robots to do so is to learn a classifier for each perceivable attribute given an exploratory behav...

CGF, 2023

State of the art of visual analytics for explainable deep learning

Biagio La Rosa*, Graziano Blasilli*, R Bourqui*, D Auber*, Giuseppe Santucci*, Roberto Capobianco, Enrico Bertini*, Romain Giot*, Marco Angelini*

The use and creation of machine-learning-based solutions to solve problems or reduce their computational costs are becoming increasingly widespread in many domains. Deep Learning plays a large part in this growth. However, it has drawbacks such as a lack of explainability an...

AAAI, 2023

DM2: Distributed Multi-Agent Reinforcement Learning via Distribution Matching

Caroline Wang*, Ishan Durugkar, Elad Liebman*, Peter Stone

Current approaches to multi-agent cooperation rely heavily on centralized mechanisms or explicit communication protocols to ensure convergence. This paper studies the problem of distributed multi-agent learning without resorting to centralized components or explicit communic...

NEURIPS, 2022

BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach

Bo Liu*, Mao Ye*, Stephen Wright*, Peter Stone, Qiang Liu*

Bilevel optimization (BO) is useful for solving a variety of important machine learning problems including but not limited to hyperparameter optimization, meta-learning, continual learning, and reinforcement learning. Conventional BO methods need to differentiate through the...

NEURIPS, 2022

Causality for Temporal Unfairness Evaluation and Mitigation

Aida Rahmattalabi, Alice Xiang

Recent interests in causality for fair decision-making systems has been accompanied with great skepticism due to practical and epistemological challenges with applying existing causal fairness approaches. Existing works mainly seek to remove the causal effect of social categ...

NEURIPS, 2022

A View From Somewhere: Human-Centric Face Representations

Jerone T. A. Andrews, Przemyslaw Joniak*, Alice Xiang

We propose to implicitly learn a set of continuous face-varying dimensions, without ever asking an annotator to explicitly categorize a person. We uncover the dimensions by learning on a novel dataset of 638,180 human judgments of face similarity (FAX). We demonstrate the ut...

NEURIPS, 2022

A View From Somewhere: Human-Centric Face Representations

Jerone T. A. Andrews, Przemyslaw Joniak*, Alice Xiang

Biases in human-centric computer vision models are often attributed to a lack of sufficient data diversity, with many demographics insufficiently represented. However, auditing datasets for diversity can be difficult, due to an absence of ground-truth labels of relevant feat...

NEURIPS, 2022

Proppo: a Message Passing Framework for Customizable and Composable Learning Algorithms

Paavo Parmas*, Takuma Seno

While existing automatic differentiation (AD) frameworks allow flexibly composing model architectures, they do not provide the same flexibility for composing learning algorithms---everything has to be implemented in terms of back propagation. To address this gap, we invent A...

NEURIPS, 2022

Value Function Decomposition for Iterative Design of Reinforcement Learning Agents

James MacGlashan, Evan Archer, Alisa Devlic, Takuma Seno, Craig Sherstan, Peter R. Wurman, Peter Stone

Designing reinforcement learning (RL) agents is typically a difficult process that requires numerous design iterations. Learning can fail for a multitude of reasons and standard RL methods provide too few tools to provide insight into the exact cause. In this paper, we show ...

NEURIPS, 2022

Outsourcing Training without Uploading Data via Eﬀicient Collaborative Open-Source Sampling

Junyuan Hong, Lingjuan Lyu, Jiayu Zhou*, Michael Spranger

As deep learning blooms with growing demand for computation and data resources, outsourcing model training to a powerful cloud server becomes an attractive alternative to training at a low-power and cost-effective end device. Traditional outsourcing requires uploading device...

NEURIPS, 2022

Calibrated Federated Adversarial Training with Label Skewness

Chen Chen, Yuchen Liu*, Xingjun Ma*, Lingjuan Lyu

Recent studies have shown that, like traditional machine learning, federated learning (FL) is also vulnerable to adversarial attacks.To improve the adversarial robustness of FL, few federated adversarial training (FAT) methods have been proposed to apply adversarial training...

NEURIPS, 2022

DENSE: Data-Free One-Shot Federated Learning

Jie Zhang*, Chen Chen, Bo Li*, Lingjuan Lyu, Shuang Wu*, Shouhong Ding*, Chunhua Shen*, Chao Wu*

One-shot Federated Learning (FL) has recently emerged as a promising approach, which allows the central server to learn a model in a single communication round. Despite the low communication cost, existing one-shot FL methods are mostly impractical or face inherent limitatio...

NEURIPS, 2022

CATER: Intellectual Property Protection on Text Generation APIs via Conditional Watermarks

Xuanli He*, Qiongkai Xu*, Yi Zeng, Lingjuan Lyu, Fangzhao Wu*, Jiwei Li*, Ruoxi Jia*

Previous works have validated that text generation APIs can be stolen through imitation attacks, causing IP violations. In order to protect the IP of text generation APIs, a recent work has introduced a watermarking algorithm and utilized the null-hypothesis test as a post-h...

NEURIPS, 2022

Prompt Certified Machine Unlearning with Randomized Gradient Smoothing and Quantization

Zijie Zhang*, Xin Zhao*, Tianshi Che*, Yang Zhou*, Lingjuan Lyu

The right to be forgotten calls for efficient machine unlearning techniques that make trained machine learning models forget a cohort of data. The combination of training and unlearning operations in traditional machine unlearning methods often leads to the expensive computa...

NEURIPS, 2022

FairVFL: A Fair Vertical Federated Learning Framework with Contrastive Adversarial Learning

Tao Qi*, Fangzhao Wu*, Chuhan Wu*, Lingjuan Lyu, Tong Xu*, Hao Liao*, Zhongliang Yang*, Yongfeng Huang*, Xing Xie*

Vertical federated learning (VFL) is a privacy-preserving machine learning paradigm that can learn models from features distributed on different platforms in a privacy-preserving way. Since in real-world applications the data may contain bias on fairness-sensitive features (...

JOURNAL OF MACHINE LEARNING RESEARCH, 2022

d3rlpy: An Offline Deep Reinforcement Learning Library

Takuma Seno, Michita Imai*

In this paper, we introduce d3rlpy, an open-sourced offline deep reinforcement learning (RL) library for Python. d3rlpy supports a set of offline deep RL algorithms as well as off-policy online algorithms via a fully documented plug-and-play API. To address a reproducibility...

CIKM, 2022

RecipeMind: Guiding Ingredient Choices from Food Pairing to Recipe Completion using Cascaded Set Transformer

We propose a computational approach for recipe ideation, a downstream task that helps users select and gather ingredients for creating dishes. To perform this task, we developed RecipeMind, a food affinity score prediction model that quantifies the suitability of adding an i...

BMVC, 2022

Content-Diverse Comparisons improve IQA

William Thong, Jose Costa Pereira*, Sarah Parisot*, Ales Leonardis*, Steven McDonagh*

Image quality assessment (IQA) forms a natural and often straightforward undertaking for humans, yet effective automation of the task remains highly challenging. Recent metrics from the deep learning community commonly compare image pairs during training to improve upon trad...

IEEE TAI, 2022

Prototype-based Interpretable Graph Neural Networks

Alessio Ragno*, Biagio La Rosa*, Roberto Capobianco

Graph neural networks have proved to be a key tool for dealing with many problems and domains such as chemistry, natural language processing and social networks. While the structure of the layers is simple, it is difficult to identify the patterns learned by the graph neural...

ICDM, 2022

FedSkip: Combatting Statistical Heterogeneity with Federated Skip Aggregation

Ziqing Fan*, Yanfeng Wang*, Jiangchao Yao*, Lingjuan Lyu, Ya Zhang*, Qi Tian*

The statistical heterogeneity of the non-independent and identically distributed (non-IID) data in local clients significantly limits the performance of federated learning. Previous attempts like FedProx, SCAFFOLD, MOON, FedNova and FedDyn resort to an optimization perspecti...

TNNLS, 2022

Privacy and Robustness in Federated Learning: Attacks and Defenses

Lingjuan Lyu, Han Yu*, Xingjun Ma*, Chen Chen, Lichao Sun*, Jun Zhao*, Qiang Yang*, Philip S. Yu*

As data are increasingly being stored in different silos and societies becoming more aware of data privacy issues, the traditional centralized training of artificial intelligence (AI) models are facing efficiency and privacy challenges. Recently, federated learning (FL) has ...

ECCV, 2022

Human-Centric Visual Diversity Auditing

Jerone T. A. Andrews, Przemyslaw Joniak*, Alice Xiang

EAAMO, 2022

Attrition of Workers with Minoritized Identities on AI Teams

Jeffrey Brown*, Tina Park*, Jiyoo Chang*, McKane Andrus*, Alice Xiang, Christine Custis*

The effects of AI systems are far-reaching and affect diverse commu- nities all over the world. The demographics of AI teams, however, do not reflect this diversity. Instead, these teams, particularly at big tech companies, are dominated by Western, White, and male work- ers...

COLING, 2022

Beyond Model Extraction: Imitation Attack for Black-Box NLP APIs

Qiongkai Xu*, Xuanli He*, Lingjuan Lyu, Lizhen Qu*, Gholamreza Haffari*

Machine-learning-as-a-service (MLaaS) has attracted millions of users to their splendid large-scale models. Although published as black-box APIs, the valuable models behind these services are still vulnerable to imitation attacks. Recently, a series of works have demonstrate...

CIKM, 2022

Cross-Network Social User Embedding with Hybrid Differential Privacy Guarantees

Jiaqian Ren*, Lei Jiang*, Hao Peng*, Lingjuan Lyu, Zhiwei Liu*, Chaochao Chen*, Jia Wu*, Xu Bai*, Philip S. Yu*

Integrating multiple online social networks (OSNs) has important implications for many downstream social mining tasks, such as user preference modelling, recommendation, and link prediction. However, it is unfortunately accompanied by growing privacy concerns about leaking s...

IROS, 2022

Quantifying Changes in Kinematic Behavior of a Human-Exoskeleton Interactive System

Keya Ghonasgi*, Reuth Mirsky*, Adrian M Haith*, Peter Stone, Ashish D Deshpande*

While human-robot interaction studies are becoming more common, quantification of the effects of repeated interaction with an exoskeleton remains unexplored. We draw upon existing literature in human skill assessment and present extrinsic and intrinsic performance metrics t...

EAAMO, 2022

AI-Competent Individuals and Laypeople Tend to Oppose Facial Analysis AI

Chiara Ullstein*, Severin Engelmann*, Orestis Papakyriakopoulos*, Michel Hohendanner*, Jens Grossklags*

Recent advances in computer vision analysis have led to a debate about the kinds of conclusions artificial intelligence (AI) should make about people based on their faces. Some scholars have argued for supposedly ``common sense'' facial inferences that can be reliably drawn ...

EMNLP, 2022

Fine-mixing: Mitigating Backdoors in Fine-tuned Language Models

Zhiyuan Zhang*, Lingjuan Lyu, Xingjun Ma*, Chenguang Wang*, Xu Sun*

Deep Neural Networks (DNNs) are known to be vulnerable to backdoor attacks. In Natural Language Processing (NLP), DNNs are often backdoored during the fine-tuning process of a large-scale Pre-trained Language Model (PLM) with poisoned samples. Although the clean weights of P...

EMNLP, 2022

Extracted BERT Model Leaks More Information than You Think!

Xuanli He*, Chen Chen, Lingjuan Lyu, Qiongkai Xu*

The collection and availability of big data, combined with advances in pre-trained models (e.g. BERT), have revolutionized the predictive performance of natural language processing tasks. This allows corporations to provide machine learning as a service (MLaaS) by encapsulat...

NESY, 2022

Grounding LTLf specifications in images

Elena Umili*, Roberto Capobianco, Giuseppe De Giacomo*

A critical challenge in neurosymbolic approaches is to handle the symbol grounding problem without direct supervision. That is mapping high-dimensional raw data into an interpretation over a finite set of abstract concepts with a known meaning, without using labels. In this ...

ICML, 2022

SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization

Yuhta Takida, Takashi Shibuya, Wei-Hsiang Liao, Chieh-Hsin Lai, Junki Ohmura*, Toshimitsu Uesaka, Naoki Murata, Shusuke Takahashi*, Toshiyuki Kumakura*, Yuki Mitsufuji

One noted issue of vector-quantized variational autoencoder (VQ-VAE) is that the learned discrete representation uses only a fraction of the full capacity of the codebook, also known as codebook collapse. We hypothesize that the training scheme of VQ-VAE, which involves some...

JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2022

Ligand-based and structure-based studies to develop predictive models for SARS-CoV-2 main protease inhibitors through the 3d-qsar.com portal

Eleonora Proia*, Alessio Ragno*, Lorenzo Antonini*, Manuela Sabatino*, Milan Mladenovič*, Roberto Capobianco, Rino Ragno*

The main protease (Mpro) of SARS-Cov-2 is the essential enzyme for maturation of functional proteins implicated in viral replication and transcription. The peculiarity of its specific cleavage site joint with its high degree of conservation among all coronaviruses promote it...

AII, 2022

Deep Reinforcement Learning for Pin-Point Autonomous Lunar Landing: Trajectory Recalculation for Obstacle Avoidance

Giulia Ciabatti*, Dario Spiller, Shreyansh Daftry*, Roberto Capobianco, Fabio Curti*

This work aims to present a method to perform autonomous precision landing—pin-point landing—on a planetary environment and perform trajectory recalculation for fault recovery where necessary. In order to achieve this, we choose to implement a Deep Reinforcement Learning—DRL...

ICCC, 2022

Interpretable Relational Representations for Food Ingredient Recommendation Systems

Kana Maruyama, Michael Spranger

Supporting chefs with ingredient recommender systems to create new recipes is challenging, as good ingredient combinations depend on many factors like taste, smell, cuisine style, texture, chef’s preference and many more. Useful machine learning models do need to be accurate...

ICML, 2022

Privacy for Free: How does Dataset Condensation Help Privacy?

Tian Dong, Bo Zhao*, Lingjuan Lyu

To prevent unintentional data leakage, research community has resorted to data generators that can produce differentially private data for model training. However, for the sake of the data privacy, existing solutions suffer from either expensive training cost or poor general...

ICML, 2022

Accelerated Federated Learning with Decoupled Adaptive Optimization

Jiayin Jin*, Jiaxiang Ren*, Yang Zhou*, Lingjuan Lyu, Ji Liu*, Dejing Dou*

The federated learning (FL) framework enables edge clients to collaboratively learn a shared inference model while keeping privacy of training data on clients. Recently, many heuristics efforts have been made to generalize centralized adaptive optimization methods, such as S...

NATURE COMMUNICATIONS, 2022

A Federated Graph Neural Network Framework for Privacy-Preserving Personalization

Yongfeng Huang*, Chuhan Wu*, Fangzhao Wu*, Lingjuan Lyu, Tao Qi*, Xing Xie*

Graph neural network (GNN) is effective in modeling high-order interactions and has been widely used in various personalized applications such as recommendation. However, mainstream personalization methods rely on centralized GNN learning on global graphs, which have conside...

ICASSP, 2022

Heterogeneous Graph Node Classification with Multi-Hops Relation Features

Xiaolong Xu*, Lingjuan Lyu, Hong Jin*, Weiqiang Wang*, Shuo Jia*

In recent years, knowledge graph~(KG) has obtained many achievements in both research and industrial fields. However, most KG algorithms consider node embedding with only structure and node features, but not relation features. In this paper, we propose a novel Heterogeneous ...

ICASSP, 2022

Distributed Graph Learning with Smooth Priors

Isabela Cunha Maia Nobre*, Mireille El Gheche, Pascal Frossard*

Graph learning is often a necessary step in processing or representing structured data, when the underlying graph is not given explicitly. Graph learning is generally performed centrally with a full knowledge of the graph signals, namely the data that lives on the graph node...

ICLR, 2022

How to Inject Backdoors with Better Consistency: Logit Anchoring on Clean Data

Zhiyuan Zhang*, Lingjuan Lyu, Weiqiang Wang*, Lichao Sun*, Xu Sun*

Since training a large-scale backdoored model from scratch requires a large training dataset, several recent attacks have considered to inject backdoors into a trained clean model without altering model behaviors on the clean data. Previous work finds that backdoors can be i...

IJCAI, 2022

Vertically Federated Graph Neural Network for Privacy-Preserving Node Classification

Chaochao Chen*, Longfei Zheng*, Huiwen Wu*, Lingjuan Lyu, Jun Zhou*, Jia Wu*, Bingzhe Wu*, Ziqi Liu*, Li Wang*, Xiaolin Zheng*

Graph Neural Network (GNN) has achieved remarkable progresses in various real-world tasks on graph data. High-performance GNN models always depend on both rich features and complete edge information in graph. However, such information could possibly be isolated by different ...

IJCAI, 2022

Dynamic Sparse Training for Deep Reinforcement Learning

Ghada Sokar, Elena Mocanu, Decebal Constantin Mocanu, Mykola Pechenizkiy, Peter Stone

Deep reinforcement learning (DRL) agents are trained through trial-and-error interactions with the environment. This leads to a long training time for dense neural networks to achieve good performance. Hence, prohibitive computation and memory resources are consumed. Recentl...

IJCAI, 2022

Data- Free Adversarial Knowledge Distillation for Graph Neural Networks

Yuanxin Zhuang*, Lingjuan Lyu, Chuan Shi*, Carl Yang*, Lichao Sun*

Graph neural networks (GNNs) have been widely used in modeling graph structured data, owing to its impressive performance in a wide range of practical applications. Recently, knowledge distillation (KD) for GNNs has enabled remarkable progress in graph model compression and ...

TDSC, 2022

Decision Boundary-aware Data Augmentation for Adversarial Training

Chen Chen, Jingfeng Zhang*, Xilie Xu*, Lingjuan Lyu, Chaochao Chen*, Tianlei Hu*, Gang Chen*

Adversarial training (AT) is a typical method to learn adversarially robust deep neural networks via training on the adversarial variants generated by their natural examples. However, as training progresses, the training data becomes less attackable, which may undermine the ...

TSPIN, 2022

Wasserstein-based Graph Alignment

Hermina Petric Maretic*, Mireille El Gheche, Giovanni Chierchia*, Pascal Frossard*

We propose a novel method for comparing non-aligned graphs of different sizes, based on the Wasserstein distance between graph signal distributions induced by the respective graph Laplacian matrices. Specifically, we cast a new formulation for the one-to-many graph alignment...

NATURE COMMUNICATIONS, 2022

Communication-Efficient Federated Learning via Knowledge Distillation

Yongfeng Huang*, Chuhan Wu*, Fangzhao Wu*, Lingjuan Lyu, Xing Xie*

Federated learning is a privacy-preserving machine learning technique to train intelligent models from decentralized data, which enables exploiting private data by communicating local model updates in each iteration of model learning rather than the raw data. However, model ...

IEEE TRANSACTIONS ON BIG DATA, 2022

Practical Attribute Reconstruction Attack Against Federated Learning

Chen Chen, Lingjuan Lyu, Han Yu*, Gang Chen*

Existing federated learning (FL) designs have been shown to exhibit vulnerabilities which can be exploited by adversaries to compromise data privacy. However, most current works conduct attacks by leveraging gradients calculated on a small batch of data. This setting is not ...

IEEE HAPTICS SYMPOSIUM, 2022

Enhancing Haptic Distinguishability of Surface Materials with Boosting Technique

Priyadarshini Kumari, Subhasis Chaudhuri*

Discriminative features are crucial for several learning applications, such as object detection and classification. Neural networks are extensively used for extracting discriminative features of images and speech signals. However, the lack of large datasets in the haptics d...

NEURAL NETWORKS, 2022

Selective particle attention: Rapidly and flexibly selecting features for deep reinforcement learning

Sam Blakeman, Denis Mareschal*

Deep Reinforcement Learning (RL) is often criticized for being data inefficient and inflexible to changes in task structure. Part of the reason for these issues is that Deep RL typically learns end-to-end using backpropagation, which results in task-specific representations....

TKDE, 2022

Traffic Anomaly Prediction Based on Joint Static-Dynamic Spatio-Temporal Evolutionary Learning

Xiaoming Liu*, Zhanwei Zhang*, Lingjuan Lyu, Zhaohan Zhang*, Shuai Xiao*, Chao Shen*, Philip Yu*

Accurate traffic anomaly prediction offers an opportunity to save the wounded at the right location in time. However, the complex process of traffic anomaly is affected by both various static factors and dynamic interactions. The recent evolving representation learning provi...

NATURE, 2022

Outracing Champion Gran Turismo Drivers with Deep Reinforcement Learning

Pete Wurman, Samuel Barrett, Kenta Kawamoto, James MacGlashan, Kaushik Subramanian, Thomas Walsh, Roberto Capobianco, Alisa Devlic, Franziska Eckert, Florian Fuchs, Leilani Gilpin, Piyush Khandelwal, Varun Kompella, Hao Chih Lin, Patrick MacAlpine, Declan Oller, Takuma Seno, Craig Sherstan, Michael D. Thomure, Houmehr Aghabozorgi, Leon Barrett, Rory Douglas, Dion Whitehead Amago, Peter Dürr, Peter Stone, Michael Spranger, Hiroaki Kitano

Many potential applications of artificial intelligence involve making real-time decisions in physical systems while interacting with humans. Automobile racing represents an extreme example of these conditions; drivers must execute complex tactical manoeuvres to pass or block...

WWW, 2022

Differential Private Knowledge Transfer for Privacy-Preserving Cross-Domain Recommendation

Chaochao Chen*, Huiwen Wu*, Jiajie Su*, Lingjuan Lyu, Xiaolin Zheng*, Li Wang*

Cross Domain Recommendation (CDR) has been popularly studied to alleviate the cold-start and data sparsity problem commonly existed in recommender systems. CDR models can improve the recommendation performance of a target domain by leveraging the data of other source domains...

AAAI, 2022

GEAR: A Margin-based Federated Adversarial Training Approach

Chen Chen, Jie Zhang*, Lingjuan Lyu

Previous studies have shown that federated learning (FL) is vulnerable to well-crafted adversarial examples. Some recent efforts tried to combine adversarial training with FL, i.e., federated adversarial training (FAT), in order to achieve adversarial robustness in FL. Howev...

AAAI, 2022

Byzantine-resilient Federated Learning via Gradient Memorization

Chen Chen, Lingjuan Lyu, Yuchen Liu*, Fangzhao Wu*, Chaochao Chen*, Gang Chen*

Federated learning (FL) provides a privacy-aware learning framework by enabling a multitude of participants to jointly construct models without collecting their private training data. However, federated learning has exhibited vulnerabilities to Byzantine attacks. Many existi...

ACM TIST, 2022

FedBERT: When Federated Learning Meets Pre-Training

Yuanyishu Tian*, Yao Wan*, Lingjuan Lyu, Dezhong Yao*, Hai Jin*, Lichao Sun*

The fast growth of pre-trained models (PTMs) has brought natural language processing to a new era, which becomes a dominant technique for various natural language processing (NLP) applications. Every user can download weights of PTMs, then fine-tune the weights on a task on ...

ACM TIST, 2022

FedCTR: Federated Native Ad CTR Prediction with Cross Platform User Behavior Data

Chuhan Wu*, Fangzhao Wu*, Lingjuan Lyu, Yongfeng Huang*, Xing Xie*

Native ad is a popular type of online advertisement which has similar forms with the native content displayed on websites. Native ad CTR prediction is useful for improving user experience and platform revenue. However, it is challenging due to the lack of explicit user inten...

AAAI, 2022

DADFNet: Dual Attention and Dual Frequency-Guided Dehazing Network for Video-Empowered Intelligent Transportation

Yu Guo*, Wen Liu*, Jiangtian Nie*, Lingjuan Lyu, Zehui Xiong*, Jiawen Kang*, Han Yu*, Dusit Niyato*

Visual surveillance technology is an indispensable functional component of advanced traffic management systems. It has been applied to perform traffic supervision tasks, such as object detection, tracking and recognition. However, adverse weather conditions, e.g., fog, haze ...

AAAI, 2022

Protecting Intellectual Property of Language Generation APIs with Lexical Watermark

Xuanli He*, Qiongkai Xu*, Lingjuan Lyu, Fangzhao Wu*, Chenguang Wang*

Nowadays, due to the breakthrough in natural language generation (NLG), including machine translation, document summarization, image captioning, etc NLG models have been encapsulated in cloud APIs to serve over half a billion people worldwide and process over one hundred bil...

AIAA SCITECH FORUM, 2022

Planetary Environment Prediction Using Generative Modeling

Shrijit Singh*, Shreyansh Daftry*, Roberto Capobianco

Planetary rovers have a limited sensory horizon and operate in environments where limited information about the surrounding terrain is available. The rough and unknown nature of the terrain in planetary environments potentially leads to scenarios where the rover gets stuckan...

AAAI, 2022

fGOT: Graph Distances based on Filters and Optimal Transport

Hermina Petric Maretic*, Mireille El Gheche, Giovanni Chierchia*, Pascal Frossard*

Graph comparison deals with identifying similarities and dissimilarities between graphs. A major obstacle is the unknown alignment of graphs, as well as the lack of accurate and inexpensive comparison metrics. In this work we introduce the filter graph distance. It is an opt...

ARTIFICIAL INTELLIGENCE, 2022

Logic Tensor Networks

Samy Badreddine, Artur d'Avila Garcez*, Luciano Serafini*, Michael Spranger

Attempts at combining logic and neural networks into neurosymbolic approaches have been on the increase in recent years. In a neurosymbolic system, symbolic knowledge assists deep learning, which typically uses a sub-symbolic distributed representation, to learn and reason a...

NEURIPS, 2021

Exploiting Data Sparsity in Secure Cross-Platform Social Recommendation

Jamie Cui*, Chaochao Chen*, Lingjuan Lyu, Carl Yang*, Li Wang*

Social recommendation has shown promising improvements over traditional systems since it leverages social correlation data as an additional input. Most existing works assume that all data are available to the recommendation platform. However, in practice, user-item interacti...

NEURIPS, 2021

Anti-Backdoor Learning: Training Clean Models on Poisoned Data

Yige Li*, Xixiang Lyu*, Nodens Koren*, Lingjuan Lyu, Bo Li*, Xingjun Ma*

Backdoor attack has emerged as a major security threat to deep neural networks(DNNs). While existing defense methods have demonstrated promising results on detecting and erasing backdoor triggers, it is still not clear if measures can be taken to avoid the triggers from bein...

NEURIPS, 2021

Gradient Driven Rewards to Guarantee Fairness in Collaborative Machine Learning

Xu Xinyi*, Lingjuan Lyu, Xingjun Ma*, Chenglin Miao*, Chuan-Sheng Foo*, Bryan Kian Hsiang Low*

Collaborative machine learning provides a promising framework for different agents to pool their resources (e.g., data) for a common learning task. In realistic settings where agents are self-interested and not altruistic, they may be unwilling to share data or model without...

NEURIPS, 2021

Expert Human-Level Driving in Gran Turismo Sport Using Deep Reinforcement Learning with Image-based Representation

Ryuji Imamura, Takuma Seno, Kenta Kawamoto, Michael Spranger

When humans play virtual racing games, they use visual environmental information on the game screen to understand the rules within the environments. In contrast, a state-of-the-art realistic racing game AI agent that outperforms human players does not use image-based environ...

NEURIPS, 2021

d3rlpy: An Offline Deep Reinforcement Learning Library

Takuma Seno, Michita Imai*

In this paper, we introduce d3rlpy, an open-sourced offline deep reinforcement learning (RL) library for Python. d3rlpy supports a number of offline deep RL algorithms as well as online algorithms via a user-friendly API. To assist deep RL research and development projects, ...

AIXIA, 2021

Tafl-ES: Exploring Evolution Strategies for Asymmetrical Board Games

Roberto Gallotta*, Roberto Capobianco

NeuroEvolution Strategies (NES) are a subclass of Evolution Strategies (ES). While their application to games and board games have been studied in the past [11], current state of the art in most of the games is still held by classic RL models, such as AlphaGo Zero [16]. This...

AIXIA, 2021

Exploration-Intensive Distractors: Two Environment Proposals and a Benchmarking

Jim Martin Catacora Ocaña*, Roberto Capobianco, Daniele Nardi*

Sparse-reward environments are famously challenging for deep reinforcement learning (DRL) algorithms. Yet, the prospect of solving intrinsically sparse tasks in an end-to-end fashion without any extra reward engineering is highly appealing. Such aspiration has recently led t...

AIXIA, 2021

Detection Accuracy for Evaluating Compositional Explanations of Units

Sayo M. Makinwa*, Biagio La Rosa*, Roberto Capobianco

The recent success of deep learning models in solving complex problems and in different domains has increased interest in understanding what they learn. Therefore, different approaches have been employed to explain these models, one of which uses human-understandable concept...

AIXIA, 2021

A Discussion about Explainable Inference on Sequential Data via Memory-Tracking

Biagio La Rosa*, Roberto Capobianco, Daniele Nardi*

The recent explosion of deep learning techniques boosted the application of Artificial Intelligence in a variety of domains, thanks to their high performance. However, performance comes at the cost of interpretability: deep models contain hundred of nested non-linear operati...

IEEE IOT-J, 2021

Data Poisoning Attacks on Federated Machine Learning

Gan Sun*, Yang Cong*, Jiahua Dong*, Qiang Wang*, Lingjuan Lyu, Ji Liu*

Federated machine learning which enables resource-constrained node devices (e.g., Internet of Things (IoT) devices, smartphones) to establish a knowledge-shared model while keeping the raw data local, could provide privacy preservation and economic benefit by designing an ef...

IEEE TNNLS, 2021

Joint Stance and Rumor Detection in Hierarchical Heterogeneous Graph

Chen li*, Hao Peng*, Jianxin Li*, Lichao Sun*, Lingjuan Lyu, Lihong Wang*, Philip Yu*, Lifang He*

Recently, large volumes of false or unverified information (e.g., fake news and rumors) appear frequently in emerging social media, which are often discussed on a large scale and widely disseminated, causing bad consequences. Many studies on rumor detection indicate that the...

BMVC, 2021

Feature and Label Embedding Spaces Matter in Addressing Image Classifier Bias

William Thong, Cees Snoek*

This paper strives to address image classifier bias, with a focus on both feature and label embedding spaces. Previous works have shown that spurious correlations from protected attributes, such as age, gender, or skin tone, can cause adverse decisions. To balance potential ...

ASCEND, 2021

Learning Transferable Policies for Autonomous Planetary Landing via Deep Reinforcement Learning

Giulia Ciabatti*, Shreyansh Daftry*, Roberto Capobianco

In this work, we develop an application for autonomous landing, exploiting the properties of Deep Reinforcement Learning and Transfer Learning in order to tackle the problem of planetary landing on unknown or barely-known extra-terrestrial environments by learning good-perfo...

IEEE ACCESS, 2021

RecipeBowl: A Cooking Recommender for Ingredients and Recipes using Set Transformer

Michael Spranger, Kana Maruyama

Countless possibilities of recipe combinations challenge us to determine which additional ingredient goes well with others. In this work, we propose RecipeBowl which is a cooking recommendation system that takes a set of ingredients and cooking tags as input and suggests pos...

IJCLR, 2021

Extending Real Logic with Aggregate Functions

Samy Badreddine, Michael Spranger

Real Logic is a recently introduced first-order language where formulas have fuzzy truth values in the interval [0, 1] and semantics are defined concretely with real domains. The Logic Tensor Networks (LTN) framework has applied Real Logic to many important AI tasks through ...

IEEE TII, 2021

FLEAM: A Federated Learning Empowered Architecture to Mitigate DDoS in Industrial IoT

Jianhua Li*, Lingjuan Lyu, Ximeng Liu*, Xuyun Zhang*, Xixiang Lyu*

IJCAI, 2021

A Novel Attribute Reconstruction Attack in Federated Learning

Lingjuan Lyu, Chen Chen

Federated learning (FL) emerged as a promising learning paradigm to enable a multitude of partici- pants to construct a joint ML model without expos- ing their private training data. Existing FL designs have been shown to exhibit vulnerabilities which can be exploited by adv...

IJCAI, JAIR, 2021

Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog

Jesse Thomason*, Aishwarya Padmakumar*, Jivko Sinapov*, Nick Walker*, Yuqian Jiang*, Harel Yedidsion*, Justin Hart*, Peter Stone, Raymond J. Mooney*

In this work, we present methods for using human-robot dialog to improve language understanding for a mobile robot agent. The agent parses natural language to underlying semantic meanings and uses robotic sensors to create multi-modal models of perceptual concepts like red a...

JAIR, 2021

Agent-Based Markov Modeling for Improved COVID-19 Mitigation Policies

Roberto Capobianco, Varun Kompella, James Ault*, Guni Sharon*, Stacy Jong*, Spencer Fox*, Lauren Meyers*, Pete Wurman, Peter Stone

The year 2020 saw the covid-19 virus lead to one of the worst global pandemics in history. As a result, governments around the world have been faced with the challenge of protecting public health while keeping the economy running to the greatest extent possible. Epidemiologi...

TENNESSEE LAW REVIEW, 2021

Reconciling Legal and Technical Approaches to Algorithmic Bias

Alice Xiang

In recent years, there has been a proliferation of papers in the algorithmic fairness literature proposing various technical definitions of algorithmic bias and methods to mitigate bias. Whether these algorithmic bias mitigation methods would be permissible from a legal pers...

CVPR, 2021

Autonomous Planetary Landing via Deep Reinforcement Learning and Transfer Learning

Giulia Ciabatti*, Shreyansh Daftry*, Roberto Capobianco

The aim of this work is to develop an application for autonomous landing. We exploit the properties of Deep Reinforcement Learning and Transfer Learning, in order to tackle the problem of planetary landing on unknown or barely-known extra-terrestrial environments by learning...

ICRA, 2021

Autonomous Overtaking in Gran Turismo Sport Using Curriculum Reinforcement Learning

Yunlong Song*, Hao Chih Lin, Elia Kaufmann*, Peter Dürr, Davide Scaramuzza*

Professional race-car drivers can execute extreme overtaking maneuvers. However, existing algorithms for autonomous overtaking either rely on simplified assumptions about the vehicle dynamics or try to solve expensive trajectory-optimization problems online. When the vehicle...

ICRA, 2021

Efficient Real-Time Inference in Temporal Convolution Networks

Piyush Khandelwal, James MacGlashan, Pete Wurman, Peter Stone

It has been recently demonstrated that Temporal Convolution Networks (TCNs) provide state-of-the-art results in many problem domains where the input data is a time-series. TCNs typically incorporate information from a long history of inputs (the receptive field) into a singl...

AIES, 2021

On the Validity of Arrest as a Proxy for Offense: Race and the Likelihood of Arrest for Violent Crimes

Riccardo Fogliato*, Alice Xiang, Zachary Lipton*, Daniel Nagin*, Alexandra Chouldechova*

The risk of re-offense is considered in decision-making at many stages of the criminal justice system, from pre-trial, to sentencing, to parole. To aid decision makers in their assessments, institutions increasingly rely on algorithmic risk assessment instruments (RAIs). The...

AIES, 2021

Uncertainty as a Form of Transparency: Measuring, Communicating, and Using Uncertainty

Umang Bhatt*, Javier Antorán*, Yunfeng Zhang*, Q. Vera Liao*, Prasanna Sattigeri*, Riccardo Fogliato*, Gabrielle Melançon*, Ranganath Krishnan*, Jason Stanley*, Omesh Tickoo*, Lama Nachman*, Rumi Chunara*, Madhu*

Algorithmic transparency entails exposing system properties to various stakeholders for purposes that include understanding, improving, and contesting predictions. Until now, most research into algorithmic transparency has predominantly focused on explainability. Explainabil...

AAMAS, 2021

Multiagent Epidemiologic Inference through Realtime Contact Tracing

Guni Sharon*, James Ault*, Peter Stone, Varun Kompella, Roberto Capobianco

This paper addresses an epidemiologic inference problem where, given realtime observation of test results, presence of symptoms,and physical contacts, the most likely infected individuals need to be inferred. The inference problem is modeled as a hidden Markovmodel where inf...

IEEE RA-L, ICRA, 2021

Super-Human Performance in Gran Turismo Sport Using Deep Reinforcement Learning

Florian Fuchs, Yunlong Song*, Elia Kaufmann*, Davide Scaramuzza*, Peter Dürr

Autonomous car racing is a major challenge in robotics. It raises fundamental problems for classical approaches such as planning minimum-time trajectories under uncertain dynamics and controlling the car at the limits of its handling. Besides, the requirement of minimizing t...

FACCT, 2021

"What We Can’t Measure, We Can’t Understand": Challenges to Demographic Data Procurement in the Pursuit of Fairness

McKane Andrus*, Elena Spitzer*, Jeffrey Brown*, Alice Xiang

As calls for fair and unbiased algorithmic systems increase, so too does the number of individuals working on algorithmic fairness in industry. However, these practitioners often do not have access to the demographic data they feel they need to detect bias in practice. Even ...

AAAI, 2021

Expected Value of Communication for Planning in Ad Hoc Teamwork

William Macke*, Reuth Mirsky*, Peter Stone

A desirable goal for autonomous agents is to be able to coordinate on the fly with previously unknown teammates. Known as "ad hoc teamwork", enabling such a capability has been receiving increasing attention in the research community. One of the central challenges in ad hoc ...

AAAI, 2021

Temporal-Logic-Based Reward Shaping for Continuing Reinforcement Learning Tasks

Yuqian Jiang*, Sudarshanan Bharadwaj*, Bo Wu*, Rishi Shah*, Ufuk Topcu*, Peter Stone

In continuing tasks, average-reward reinforcement learning may be a more appropriate problem formulation than the more common discounted reward formulation. As usual, learning an optimal policy in this setting typically requires a large amount of training experiences. Reward...

AAAI, 2021

Goal Blending for Responsive Shared Autonomy in a Navigating Vehicle

Yu-Sian Jiang*, Garrett Warnell*, Peter Stone

Human-robot shared autonomy techniques for vehicle navigation hold promise for reducing a human driver's workload, ensuring safety, and improving navigation efficiency. However, because typical techniques achieve these improvements by effectively removing human control at cr...

IJCAI, 2021

A Penny for Your Thoughts: The Value of Communication in Ad Hoc Teamwork

Reuth Mirsky*, William Macke*, Andy Wang*, Harel Yedidsion*, Peter Stone

In ad hoc teamwork, multiple agents need to collaborate without having knowledge about their teammates or their plans a priori. A common assumption in this research area is that the agents cannot communicate. However, just as two random people may speak the same language, au...

IJCAI, 2021

Balancing Individual Preferences and Shared Objectives in Multiagent Reinforcement Learning

Ishan Durugkar, Elad Liebman*, Peter Stone

In multiagent reinforcement learning scenarios, it is often the case that independent agents must jointly learn to perform a cooperative task. This paper focuses on such a scenario in which agents have individual preferences regarding how to accomplish the shared task. We co...

IJCAI, 2021

Explainable Inference on Sequential Data via Memory-Tracking

Biagio La Rosa*, Roberto Capobianco, Daniele Nardi*

In this paper we present a novel mechanism to get explanations that allow to better understand network predictions when dealing with sequential data. Specifically, we adopt memory-based networks — Differential Neural Computers — to exploit their capability of storing data in...

NEURIPS, 2020

Assessing SATNet's Ability to Solve the Symbol Grounding Problem

Michael Spranger, Oscar Chang*, Lampros Flokas*, Hod Lipson*

SATNet is an award-winning MAXSAT solver that can be used to infer logical rules and integrated as a differentiable layer in a deep neural network. It had been shown to solve Sudoku puzzles visually from examples of puzzle digit images, and was heralded as an impressive achi...

NEURIPS, 2020

Temporal Positive-unlabeled Learning for Biomedical Hypothesis Generation via Risk Estimation

Uchenna Akujuobi, Jun Chen*, Mohamed Elhoseiny*, Michael Spranger, Xiangliang Zhang*

Understanding the relationships between biomedical terms like viruses, drugs, and symptoms is essential in the fight against diseases. Many attempts have been made to introduce the use of machine learning to the scientific process of hypothesis generation (HG), which refers ...

NEURIPS, 2020

Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks

Lemeng Wu*, Bo Liu*, Peter Stone, Qiang Liu*

We propose firefly neural architecture descent, a general framework for progressively and dynamically growing neural networks to jointly optimize the networks' parameters and architectures. Our method works in a steepest descent fashion, which iteratively finds the best netw...

NEURIPS, 2020

An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch

Siddharth Desai*, Ishan Durugkar, Haresh Karnan*, Garrett Warnell*, Josiah Hanna*, Peter Stone

We examine the problem of transferring a policy learned in a source environment to a target environment with different dynamics, particularly in the case where it is critical to reduce the amount of interaction with the target environment during learning. This problem is par...

AAAI AI FOR SOCIAL GOOD, 2020

Reinforcement Learning for Optimization of COVID-19 Mitigation Policies

Varun Kompella, Roberto Capobianco, Stacy Jong*, Jonathan Browne*, Spencer Fox*, Lauren Meyers*, Pete Wurman, Peter Stone

The year 2020 has seen the COVID-19 virus lead to one of the worst global pandemics in history. As a result, governments around the world are faced with the challenge of protecting public health, while keeping the economy running to the greatest extent possible. Epidemiologi...

IEEE TKDE, 2020

T-PAIR: Temporal node-pair embedding for automatic biomedical hypothesis generation

Uchenna Akujuobi, Michael Spranger, Sucheendra K Palaniappan*, Xiangliang Zhang*

In this paper, we study an automatic hypothesis generation (HG) problem, which refers to the discovery of meaningfulimplicit connections between scientific terms, including but not limited to diseases, chemicals, drugs, and genes extracted fromdatabases of biomedical publica...