Publications
Diffusion-based Signal Refiner for Speech Enhancement and Separation
Ryosuke Sawata, Masato Hirano*, Naoki Murata, Shusuke Takahashi*, Yuki Mitsufuji
Although recent speech processing technologies have achieved significant improvements in objective metrics, there still remains a gap in human perceptual quality. This paper proposes Diffiner, a novel solution that utilizes the powerful generative capability of diffusion mod...S-PRESSO: Ultra Low Bitrate Sound Effect Compression With Diffusion Autoencoders And Offline Quantization
Zineb Lahrichi, Gaëtan Hadjeres, Gaël Richard, Geoffroy Peeters
Neural audio compression models have recently achieved extreme compression rates, enabling efficient latent generative modeling. Conversely, latent generative models have been applied to compression, pushing the limits of continuous and discrete approaches. However, existing...PAVAS: Physics-Aware Video-to-Audio Synthesis
Oh Hyun-Bin*, Yuhta Takida, Toshimitsu Uesaka, Tae-Hyun Oh*, Yuki Mitsufuji
Recent advances in Video-to-Audio (V2A) generation have achieved impressive perceptual quality and temporal synchronization, yet most models remain appearance-driven, capturing visual-acoustic correlations without considering the physical factors that shape real-world sounds...MeanFlow Transformers with Representation Autoencoders
Zheyuan Hu*, Chieh-Hsin Lai, Ge Wu*, Yuki Mitsufuji, Stefano Ermon*
MeanFlow (MF) is a diffusion-motivated generative model that enables efficient few-step generation by learning long jumps directly from noise to data. In practice, it is often used as a latent MF by leveraging the pre-trained Stable Diffusion variational autoencoder (SD-VAE)...REWIND: Speech Time Reversal for Enhancing Speaker Representations in Diffusion-based Voice Conversion
Ishan Biyani, Nirmesh Shah*, Ashishkumar Gudmalwar, Pankaj Wasnik, Rajiv Ratn Shah
Speech time reversal refers to the process of reversing the entire speech signal in time, causing it to play backward. Such signals are completely unintelligible since the fundamental structures of phonemes and syllables are destroyed. However, they still retain tonal patter...GenDataAgent: On-the-fly Dataset Augmentation with Synthetic Data
Zhiteng Li, Lele Chen, Jerone Andrews, Yunhao Ba, Yulun Zhang, Alice Xiang
We propose a generative agent that augments training datasets with synthetic datafor model fine-tuning. Unlike prior work, which uniformly samples synthetic data,our agent iteratively generates relevant samples on-the-fly, aligning with the targetdistribution. It prioritizes...From Neural Networks to Logical Theories: The Correspondence between Fibring Modal Logics and Fibring Neural Networks
Ouns El Harzli, Bernardo Cuenca Grau*, Artur d'Avila Garcez*, Ian Horrocks, Tarek R Besold
Fibring of modal logics is a well-established formalism for combining countable families of modal logics into a single fibred language with common semantics, characterized by fibred models. Inspired by this formalism, fibring of neural networks was introduced as a neurosymbo...Theory-Informed Improvements to Classifier-Free Guidance for Discrete Diffusion Models
Kevin Rojas*, Ye He*, Chieh-Hsin Lai, Yuhta Takida, Yuki Mitsufuji, Molei Tao*
Classifier-Free Guidance (CFG) is a widely used technique for conditional generation and improving sample quality in continuous diffusion models, and recent works have extended it to discrete diffusion. This paper theoretically analyzes CFG in the context of masked discrete ...3D Scene Prompting for Scene-Consistent Camera-Controllable Video Generation
JoungBin Lee*, Jaewoo Jung*, Jisang Han*, Takuya Narihira, Kazumi Fukuda, Junyoung Seo*, Sunghwan Hong*, Yuki Mitsufuji, Seungryong Kim*
We present 3DScenePrompt, a framework that generates the next video chunk from arbitrary-length input while enabling precise camera control and preserving scene consistency. Unlike methods conditioned on a single image or a short clip, we employ dual spatio-temporal conditio...LLM2Fx-Tools: Tool Calling For Music Post-Production
Seungheon Doh*, Junghyun Koo, Marco A. Martínez-Ramírez, Woosung Choi, Wei-Hsiang Liao, Qiyu Wu*, Juhan Nam*, Yuki Mitsufuji
This paper introduces LLM2Fx-Tools, a multimodal tool-calling framework that generates executable sequences of audio effects (Fx-chain) for music post-production. LLM2Fx-Tools uses a large language model (LLM) to understand audio inputs, select audio effects types, determine...SONA: Learning Conditional, Unconditional, and Mismatching-Aware Discriminator
Yuhta Takida, Satoshi Hayakawa*, Takashi Shibuya, Masaaki Imaizumi*, Naoki Murata, Bac Nguyen, Toshimitsu Uesaka, Chieh-Hsin Lai, Yuki Mitsufuji
Deep generative models have made significant advances in generating complex content, yet conditional generation remains a fundamental challenge. Existing conditional generative adversarial networks often struggle to balance the dual objectives of assessing authenticity and c...Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment
Bac Nguyen, Yuhta Takida, Naoki Murata, Chieh-Hsin Lai, Toshimitsu Uesaka, Stefano Ermon*, Yuki Mitsufuji
Slot Attention (SA) with pretrained diffusion models has recently shown promise for object-centric learning (OCL), but suffers from slot entanglement and weak alignment between object slots and image content. We propose Contrastive Object-centric Diffusion Alignment (CODA), ...Concept-TRAK: Understanding How Diffusion Models Learn Concepts through Concept-Level Attribution
Yonghyun Park*, Chieh-Hsin Lai, Satoshi Hayakawa*, Yuhta Takida, Naoki Murata, Wei-Hsiang Liao, Woosung Choi, Kin Wai Cheuk, Junghyun Koo, Yuki Mitsufuji
While diffusion models excel at image generation, their growing adoption raises critical concerns around copyright issues and model transparency. Existing attribution methods identify training examples influencing an entire image, but fall short in isolating contributions to...CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models
Zheyuan Hu*, Chieh-Hsin Lai, Yuki Mitsufuji, Stefano Ermon*
Flow map models such as Consistency Models (CM) and Mean Flow (MF) enable few-step generation by learning the long jump of the ODE solution of diffusion models, yet training remains unstable, sensitive to hyperparameters, and costly. Initializing from a pre-trained diffusion...VIRTUE: Visual-Interactive Text-Image Universal Embedder
Wei-Yao Wang*, Kazuya Tateishi*, Qiyu Wu*, Shusuke Takahashi*, Yuki Mitsufuji
Multimodal representation learning models have demonstrated successful operation across complex tasks, and the integration of vision-language models (VLMs) has further enabled embedding models with instruction-following capabilities. However, existing embedding models lack v...Tracing the Principles Behind Modern Diffusion Models
Chieh-Hsin Lai, Yang Song*, Dongjun Kim*, Yuki Mitsufuji, Stefano Ermon*
Diffusion models can feel like a jungle of acronyms, but the core idea is simple: start from noise and gradually move a cloud of samples until it looks like real data. This post gives an intuition-first tour showing that DDPMs, score-based models, and flow matching are the s...FoleyBench: A Benchmark For Video-to-Audio Models
Satvik Dixit, Koichi Saito, Zhi Zhong*, Yuki Mitsufuji, Chris Donahue
Video-to-audio generation (V2A) is of increasing importance in domains such as film post-production, AR/VR, and sound design, particularly for the creation of Foley sound effects synchronized with on-screen actions. Foley requires generating audio that is both semantically a...Automatic Music Mixing Using a Generative Model of Effect Embeddings
Eloi Moliner, Marco A. Martínez-Ramírez, Junghyun Koo, Wei-Hsiang Liao, Kin Wai Cheuk, Joan Serrà, Vesa Välimäki*, Yuki Mitsufuji
Music mixing involves combining individual tracks into a cohesive mixture, a task characterized by subjectivity where multiple valid solutions exist for the same input. Existing automatic mixing systems treat this task as a deterministic regression problem, thus ignoring thi...Automatic Music Sample Identification with Multi-Track Contrastive Learning
Alain Riou, Joan Serrà, Yuki Mitsufuji
Sampling, the technique of reusing pieces of existing audio tracks to create new music content, is a very common practice in modern music production. In this paper, we tackle the challenging task of automatic sample identification, that is, detecting such sampled content and...Leveraging Whisper Embeddings for Audio-based Lyrics Matching
Eleonora Mancini*, Joan Serrà, Paolo Torroni*
Audio-based lyrics matching can be an appealing alternative to other content-based retrieval approaches, but existing methods often suffer from limited reproducibility and inconsistent baselines. In this work, we introduce WEALY, a fully reproducible pipeline that leverages ...MMAudioSep: Taming Video-to-Audio Generative Model Towards Video/Text-Queried Sound Separation
Akira Takahashi*, Shusuke Takahashi*, Yuki Mitsufuji
We introduce MMAudioSep, a generative model for video/text-queried sound separation that is founded on a pretrained video-to-audio model. By leveraging knowledge about the relationship between video/text and audio learned through a pretrained audio generative model, we can t...Towards Blind Data Cleaning: A Case Study in Music Source Separation
Azalea Gui, Woosung Choi, Junghyun Koo, Kazuki Shimada, Takashi Shibuya, Joan Serrà, Wei-Hsiang Liao, Yuki Mitsufuji
The performance of deep learning models for music source separation heavily depends on training data quality. However, datasets are often corrupted by difficult-to-detect artifacts such as audio bleeding and label noise. Since the type and extent of contamination are typical...SAVGBench: Benchmarking Spatially Aligned Audio-Video Generation
Kazuki Shimada, Christian Simon, Takashi Shibuya, Shusuke Takahashi*, Yuki Mitsufuji
This work addresses the lack of multimodal generative models capable of producing high-quality videos with spatially aligned audio. While recent advancements in generative models have been successful in video generation, they often overlook the spatial alignment between audi...Do Foundational Audio Encoders Understand Music Structure?
Keisuke Toyama*, Zhi Zhong*, Akira Takahashi*, Shusuke Takahashi*, Yuki Mitsufuji
In music information retrieval (MIR) research, the use of pretrained foundational audio encoders (FAEs) has recently become a trend. FAEs pretrained on large amounts of music and audio data have been shown to improve performance on MIR tasks such as music tagging and automat...EyeO: Autocalibrating Gaze Output with Gaze Input for Gaze Typing
Akanksha Saran, Jacob Alber*, Cyril Zhang*, Ann Paradiso*, Danielle Bragg*, John Langford*
Gaze tracking devices have the potential to expand interactivity greatly, yet miscalibration remains a significant barrier to use. As devices miscalibrate, people tend to compensate by intentionally offsetting their gaze, which makes detecting miscalibration from eye signals...Human-Interactive Robot Learning: Definition, Challenges, and Recommendations
Kim Baraka*, Ifrah Idrees*, Taylor Kessler Faulkner*, Erdem Biyik*, Serena Booth*, Mohamed Chetouani*, Daniel Grollman*, Akanksha Saran, Emmanuel Senft*, Silvia Tulli*, Anna-Lisa Vollmer*, Antonio Andriella*, Helen Beierling*, Tiffany Horter*, Jens Kober*, Isaac Sheidlower*, Matthew Taylor*, Sanne van Waveren*, Xuesu Xiao*
Robot learning from humans has been proposed and researched for several decades as a means to enable robots to learn new skills or adapt existing ones to new situations. Recent advances in artificial intelligence, including learning approaches like reinforcement learning and...Listen Like a Teacher: Mitigating Whisper Hallucinations using Adaptive Layer Attention and Knowledge Distillation
Kumud Tripathi, Aditya Srinivas Menon, Aman Gupta, Raj Prakash Gohil, Pankaj Wasnik
The Whisper model, an open-source automatic speech recognition system, is widely adopted for its strong performance across multilingual and zero-shot settings. However, it frequently suffers from hallucination errors, especially under noisy acoustic conditions. Previous work...XAI-Guided Continual Learning: Rationale, Methods, and Future Directions
Michela Proietti*, Alessio Ragno*, Roberto Capobianco
Providing neural networks with the ability to learn new tasks sequentially represents one of the main challenges in artificial intelligence. Unlike humans, neural networks are prone to losing previously acquired knowledge upon learning new information, a phenomenon known as ...Interpretable Memory-based Prototypical Pooling
Alessio Ragno*, Roberto Capobianco
Graph Neural Networks (GNNs) have proven their effectiveness in various graph-structured data applications. However, one of the significant challenges in the realm of GNNs is representation learning, a critical concept that bridges graph pooling, aimed at creating compressed...Intermediate Layers of LLMs Align Best With the Brain by Balancing Short- and Long-Range Information
Michela Proietti*, Roberto Capobianco, Mariya Toneva
Contextual integration is fundamental to human language comprehension. Language models are a powerful tool for studying how contextual information influences brain activity. In this work, we analyze the brain alignment of three types of language models, which vary in how the...ProtoCRL: Prototype-based Network for Continual Reinforcement Learning
Michela Proietti*, Peter R. Wurman, Peter Stone, Roberto Capobianco
The purpose of continual reinforcement learning is to train an agent on a sequence of tasks such that it learns the ones that appear later in the sequence while retaining theability to perform the tasks that appeared earlier. Experience replay is a popular method used to mak...Automated Reward Design for Gran Turismo
Michel Ma, Takuma Seno, Kaushik Subramanian, Peter R. Wurman, Peter Stone, Craig Sherstan
When designing reinforcement learning (RL) agents, a designer communicates the desired agent behavior through the definition of reward functions - numerical feedback given to the agent as reward or punishment for its actions. However, mapping desired behaviors to reward func...Vid-CamEdit: Video Camera Trajectory Editing with Generative Rendering from Estimated Geometry
Junyoung Seo*, Jisang Han*, Jaewoo Jung*, Siyoon Jin, JoungBin Lee*, Takuya Narihira, Kazumi Fukuda, Takashi Shibuya, Donghoon Ahn, Shoukang Hu, Seungryong Kim*, Yuki Mitsufuji
We introduce Vid-CamEdit, a novel framework for video camera trajectory editing, enabling the re-synthesis of monocular videos along user-defined camera paths. This task is challenging due to its ill-posed nature and the limited multi-view video data for training. Traditiona...SteerMusic: Enhanced Musical Consistency for Zero-shot Text-Guided and Personalized Music Editing
Xinlei Niu, Kin Wai Cheuk, Jing Zhang, Naoki Murata, Chieh-Hsin Lai, Michele Mancusi, Woosung Choi, Giorgio Fabbro*, Wei-Hsiang Liao, Charles Patrick Martin, Yuki Mitsufuji
Music editing is an important step in music production, which has broad applications, including game development and film production. Most existing zero-shot text-guided methods rely on pretrained diffusion models by involving forward-backward diffusion processes for editing...Responsibly Training Foundation Models: Actualizing Ethical Principles for Curating Large-Scale Training Datasets in the Era …
Morgan Klaus Scheuerman, Dora Zhao*, Jerone T. A. Andrews, Abeba Birhane, Q. Vera Liao*, Georgia Panagiotidou*, Pooja Chitre*, Kathleen Pine, Shawn Walker*, Jieyu Zhao*, Alice Xiang
AI technologies have become ubiquitous, influencing domains from healthcare to finance and permeating our daily lives. Concerns about the values underlying the creation and use of datasets to develop AI technologies are growing. Current dataset practices often disregard crit...How Data Workers Shape Datasets: The Role of Positionality in Data Collection and Annotation for Computer Vision
Morgan Klaus Scheuerman, Allison Woodruff, Jed R. Brubaker
Data workers play a key role in the big data industry. Clients hire data workers to collect and annotate data with human identity concepts, like demographic categories or clothing items. Often, such workers are treated as computational—they are expected to quickly and object...Learning Hierarchical Line Buffer for Image Processing
Jiacheng Li, Feiran Li, Daisuke Iso
In recent years, neural networks have achieved significant progress in offline image processing. However, in online scenarios, particularly in on-chip implementations, memory usage emerges as a critical bottleneck due to the limited memory resources of integrated image proce...Music Arena: Live Evaluation for Text-to-Music
Yonghyun Kim, Wayne Chi, Anastasios N. Angelopoulos, Wei-Lin Chiang, Koichi Saito, Shinji Watanabe, Yuki Mitsufuji, Chris Donahue
We present Music Arena, an open platform for scalable human preference evaluation of text-to-music (TTM) models. Soliciting human preferences via listening studies is the gold standard for evaluation in TTM, but these studies are expensive to conduct and difficult to compare...Large-Scale Training Data Attribution for Music Generative Models via Unlearning
Woosung Choi, Junghyun Koo, Kin Wai Cheuk, Joan Serrà, Marco A. Martínez-Ramírez, Yukara Ikemiya, Naoki Murata, Yuhta Takida, Wei-Hsiang Liao, Yuki Mitsufuji
This paper explores the use of unlearning methods for training data attribution (TDA) in music generative models trained on large-scale datasets. TDA aims to identify which specific training data points contributed to the generation of a particular output from a specific mod...Blind Inverse Problem Solving Made Easy by Text-to-Image Latent Diffusion
Michail Dontas, Yutong He, Naoki Murata, Yuki Mitsufuji, J. Zico Kolter*, Ruslan Salakhutdinov*
Blind inverse problems, where both the target data and forward operator are unknown, are crucial to many computer vision applications. Existing methods often depend on restrictive assumptions such as additional training, operator linearity, or narrow image distributions, thu...Enhancing neural audio fingerprint robustness to audio degradation for music identification
R. Oguz Araz, Guillem Cortès-Sebastià, Emilio Molina, Joan Serrà, Xavier Serra, Yuki Mitsufuji, Dmitry Bogdanov
Audio fingerprinting (AFP) allows the identification of unknown audio content by extracting compact representations, termed audio fingerprints, that are designed to remain robust against common audio degradations. Neural AFP methods often employ metric learning, where repres...Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation
Yutong He, Alexander Robey, Naoki Murata, Yiding Jiang, Joshua Williams, George J. Pappas, Hamed Hassani, Yuki Mitsufuji, Ruslan Salakhutdinov*, J. Zico Kolter*
Prompt engineering is an effective but labor-intensive way to control text-to-image (T2I) generative models. Its time-intensive nature and complexity have spurred the development of algorithms for automated prompt generation. However, these methods often struggle with transf...GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
Muhammad Jehanzeb Mirza, Mengjie Zhao*, Zhuoyuan Mao, Sivan Doveh, Wei Lin, Paul Gavrikov, Michael Dorkenwald, Shiqi Yang*, Saurav Jha, Hiromi Wakaki*, Yuki Mitsufuji
In this work, we propose GLOV, which enables Large Language Models (LLMs) to act as implicit optimizers for Vision-Language Models (VLMs) to enhance downstream vision tasks. GLOV prompts an LLM with the downstream task description, querying it for suitable VLM prompts (e.g.,...G2D2: Gradient-Guided Discrete Diffusion for Image Inverse Problem Solving
Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Bac Nguyen, Stefano Ermon*, Yuki Mitsufuji
Recent literature has effectively leveraged diffusion models trained on continuous variables as priors for solving inverse problems. Notably, discrete diffusion models with discrete latent codes have shown strong performance, particularly in modalities suited for discrete co...Reductive, Exclusionary, Normalising: The Limits of Generative AI
Fabio Morreale, Marco A. Martínez-Ramírez, Raul Masu, WeiHsiang Liao, Yuki Mitsufuji
Up until recently, most approaches to music generation were based on deductive logic: generative rules were devised on the basis of musicians’ preferences, subjective appreciation and dominant music theories. Machine learning (ML) introduced a paradigm shift: vast datasets o...Reverse Engineering of Music Mixing Graphs With Differentiable Processors and Iterative Pruning
Sungho Lee*, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Stefan Uhlich*, Giorgio Fabbro*, Kyogu Lee*, Yuki Mitsufuji
Reverse engineering of music mixes aims to uncover how dry source signals are processed and combined to produce a final mix. In this paper, prior works are extended to reflect the compositional nature of mixing and search for a graph of audio processors. First, a mixing cons...DiffVox: A Differentiable Model for Capturing and Analysing Professional Effects Distributions
Chin-Yun Yu, Marco A. Martínez-Ramírez, Junghyun Koo, Ben Hayes, Wei-Hsiang Liao, György Fazekas, Yuki Mitsufuji
This study introduces a novel and interpretable model, DiffVox, for matching vocal effects in music production. DiffVox, short for ``Differentiable Vocal Fx", integrates parametric equalisation, dynamic range control, delay, and reverb with efficient differentiable implement...Improving Inference-Time Optimisation for Vocal Effects Style Transfer with a Gaussian Prior
Chin-Yun Yu, Marco A. Martínez-Ramírez, Junghyun Koo, Wei-Hsiang Liao, Yuki Mitsufuji, György Fazekas
Style Transfer with Inference-Time Optimisation (ST-ITO) is a recent approach for transferring the applied effects of a reference audio to a raw audio track. It optimises the effect parameters to minimise the distance between the style embeddings of the processed audio and t...In-Domain African Languages Translation Using LLMs and Multi-armed Bandits
Pratik Rakesh Singh, Kritarth Prasad, Mohammadi Zaki, Pankaj Wasnik
Neural Machine Translation (NMT) systems face significant challenges when working with low-resource languages, particularly in domain adaptation tasks. These difficulties arise due to limited training data and suboptimal model generalization, As a result, selecting an opti- ...Graph-Assisted Culturally Adaptable Idiomatic Translation for Indic languages
Pratik Rakesh Singh, Kritarth Prasad, Mohammadi Zaki, Pankaj Wasnik
Translating multi-word expressions (MWEs) and idioms requires a deep understanding of the cultural nuances of both the source and target languages. This challenge is further amplified by the one-to-many nature of idiomatic translations, where a single source idiom can have m...Can Large Language Models Predict Audio Effects Parameters from Natural Language?
Seungheon Doh*, Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Juhan Nam*, Yuki Mitsufuji
In music production, manipulating audio effects (Fx) parameters through natural language has the potential to reduce technical barriers for non-experts. We present LLM2Fx, a framework leveraging Large Language Models (LLMs) to predict Fx parameters directly from textual desc...Fx-Encoder++: Extracting Instrument-Wise Audio Effects Representations from Mixtures
Yen-Tung Yeh, Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Yi-Hsuan Yang, Yuki Mitsufuji
General-purpose audio representations have proven effective across diverse music information retrieval applications, yet their utility in intelligent music production remains limited by insufficient understanding of audio effects (Fx). Although previous approaches have empha...ITO-Master: Inference-Time Optimization for Audio Effects Modeling of Music Mastering Processors
Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Giorgio Fabbro*, Michele Mancusi, Yuki Mitsufuji
Music mastering style transfer aims to model and apply the mastering characteristics of a reference track to a target track, simulating the professional mastering process. However, existing methods apply fixed processing based on a reference track, limiting users' ability to...Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning
Yixiao Zhang, Yukara Ikemiya, Woosung Choi, Naoki Murata, Marco A. Martínez-Ramírez, Liwei Lin, Gus Xia, Wei-Hsiang Liao, Yuki Mitsufuji, Simon Dixon
Recent advances in text-to-music editing, which employ text queries to modify music (e.g. by changing its style or adjusting instrumental components), present unique challenges and opportunities for AI-assisted music creation. Previous approaches in this domain have been co...I See, Therefore I Do: Estimating Causal Effects for Image Treatments
A Thorat, R Kolla, Niranjan Pedanekar*
Causal effect estimation under observational studies is challenging due to the lack of ground truth data and treatment assignment bias. Though various methods exist in literature for addressing this problem, most of them ignore multi-dimensional treatment information by cons...Bridging Perceptual Gaps in Food NLP: A Structured Approach Using Sensory Anchors
Kana Maruyama, Angel Hsing-Chi Hwang, Tarek R Besold
Understanding how humans perceive and describe food is essential for NLP applications such as semantic search, recommendation, and structured food communication. However, textual similarity often fails to reflect perceptual similarity, which is shaped by sensory experience, ...GENIE-ASI: Generative Instruction and Executable Code for Analog Subcircuit Identification
Phuoc Pham, Arun Venkitaraman, Chia-Yu Hsieh, Andrea Bonetti, Stefan Uhlich*, Markus Leibl, Simon Hofmann, Eisaku Ohbuchi, Lorenzo Servadei, Ulf Schlichtmann, Robert Wille
Analog subcircuit identification is a core task in analog design, essential for simulation, sizing, and layout. Traditional methods often require extensive human expertise, rule-based encoding, or large labeled datasets. To address these challenges, we propose GENIE-ASI, the...CCStereo: Audio-Visual Contextual and Contrastive Learning for Binaural Audio Generation
Yuanhong Chen, Kazuki Shimada, Christian Simon, Yukara Ikemiya, Takashi Shibuya, Yuki Mitsufuji
Binaural audio generation (BAG) aims to convert monaural audio to stereo audio using visual prompts, requiring a deep understanding of spatial and semantic information. The success of the BAG systems depends on the effectiveness of cross-modal reasoning and spatial understan...Schemato -- An LLM for Netlist-to-Schematic Conversion
Ryoga Matsuo, Stefan Uhlich*, Arun Venkitaraman, Andrea Bonetti, Chia-Yu Hsieh, Ali Momeni, Lukas Mauch*, Augusto Capone, Eisaku Ohbuchi, Lorenzo Servadei
Machine learning models are advancing circuit design, particularly in analog circuits. They typically generate netlists that lack human interpretability. This is a problem as human designers heavily rely on the interpretability of circuit diagrams or schematics to intuitivel...TITAN-Guide: Taming Inference-Time Alignment for Guided Text-to-Video Diffusion Models
Christian Simon, Masato Ishii, Akio Hayakawa, Zhi Zhong*, Shusuke Takahashi*, Takashi Shibuya, Yuki Mitsufuji
In the recent development of conditional diffusion models still require heavy supervised fine-tuning for performing control on a category of tasks. Training-free conditioning via guidance with off-the-shelf models is a favorable alternative to avoid further fine-tuning on th...Beyond RGB: Adaptive Parallel Processing for RAW Object Detection
Shani Gamrian, Hila Barel, Feiran Li, Masakazu Yoshimura*, Daisuke Iso
Object detection models are typically applied to standard RGB images processed through Image Signal Processing (ISP) pipelines, which are designed to enhance sensor-captured RAW images for human vision. However, these ISP functions can lead to a loss of critical information ...Image Intrinsic Scale Assessment: Bridging the Gap Between Quality and Resolution
Vlad Hosu, Lorenzo Agnolucci, Daisuke Iso, Dietmar Saupe*
Image Quality Assessment (IQA) measures and predicts perceived image quality by human observers. Although recent studies have highlighted the critical influence that variations in the scale of an image have on its perceived quality, this relationship has not been systematica...DuET: Dual Incremental Object Detection via Exemplar-Free Task Arithmetic
Munish Monga, Vishal Chudasama, Pankaj Wasnik, Biplab Banerjee*
Real-world object detection systems, such as those in autonomous driving and surveillance, must continuously learn new object categories and simultaneously adapt to changing environmental conditions. Existing approaches, Class Incremental Object Detection (CIOD) and Domain I...Transformed Low-rank Adaptation via Tensor Decomposition and Its Applications to Text-to-image Models
Zerui Tao, Yuhta Takida, Naoki Murata, Qibin Zhao*, Yuki Mitsufuji
Parameter-Efficient Fine-Tuning (PEFT) of text-to-image models has become an increasingly popular technique with many applications. Among the various PEFT methods, Low-Rank Adaptation (LoRA) and its variants have gained significant attention due to their effectiveness, enabl...Transphobia is in the Eye of the Prompter: Trans-Centered Perspectives on Large Language Models
Morgan Klaus Scheuerman, Katy Weathington, Adrian Petterson, Dylan Thomas Doyle, Dipto Das, Michael Ann DeVito, Jed R. Brubaker
Large language models (LLMs) are the new hot trend being rapidly integrated into products and services—often, in chatbots. LLM-powered chatbots are expected to respond to any number of topics, including topics central to gender identity. In light of rising anti-trans discour...Literature-based Hypothesis Generation: Predicting the evolution of scientific literature to support scientists
Tarek R Besold, Uchenna Akujuobi, Samy Badreddine, Jihun Choi, Hatem ElShazly, Frederick Gifford, Kana Maruyama, Kae Nagano, Pablo Sanchez Martin, Thiviyan Thanapalasingam, Alessandra Toniato, Christoph Wehner
Science is advancing at an increasingly quick pace, as evidenced, for instance, by the exponential growth in the number of published research articles per year [1]. On the one hand, this poses anincreasingly pressing challenge: Effectively navigating this ever-growing body o...Gastro-Health Project: Revolutionizing Personalized Nutrition and Health Forecasting Through Integrated AI Technologies
Uchenna Akujuobi, Jiu Yi, Maria Enrique Chung, Tarek Besold
Knowledge graphs are powerful tools for modelling complex, multi-relational data and supporting hypothesis generation, particularly in applications like drug repurposing. However, for predictive methods to gain acceptance as credible scientific tools, they must ensure not on...Precise Event Spotting in Sports Videos: Solving Long-Range Dependency and Class Imbalance
Sanchayan Santra, Vishal Chudasama, Pankaj Wasnik, Vineeth N Balasubramanian
Precise Event Spotting (PES) aims to identify events and their class from long, untrimmed videos, particularly in sports. The main objective of PES is to detect the event at the exact moment it occurs. Existing methods mainly rely on features from a large pre-trained network...A Comprehensive Real-World Assessment of Audio Watermarking Algorithms: Will They Survive Neural Codecs?
Yigitcan Özer, Woosung Choi, Joan Serrà, Mayank Kumar Singh*, Wei-Hsiang Liao, Yuki Mitsufuji
We introduce the Robust Audio Watermarking Benchmark (RAW-Bench), a benchmark for evaluating deep learning-based audio watermarking methods with standardized and systematic comparisons. To simulate real-world usage, we introduce a comprehensive audio attack pipeline with var...Proto Successor Measure: Representing the Space of All Possible Solutions of Reinforcement Learning
Siddhant Agarwal*, Harshit Sikchi, Peter Stone, Amy Zhang*
Having explored an environment, intelligent agents should be able to transfer their knowledge to most downstream tasks within that environment. Referred to as ``zero-shot learning," this ability remains elusive for general-purpose reinforcement learning algorithms. While rec...How to Evaluate and Mitigate IP Infringement in Visual Generative AI?
Zhenting Wang, Chen Chen, Vikash Sehwag, Minzhou Pan*, Lingjuan Lyu
The popularity of visual generative AI models like DALL-E 3, Stable Diffusion XL, Stable Video Diffusion, and Sora has been increasing. Through extensive evaluation, we discovered that the state-of-the-art visual generative models can generate content that bears a striking r...Hyperspherical Normalization for Scalable Deep Reinforcement Learning
Hojoon Lee, Youngdo Lee, Takuma Seno, Donghu Kim, Peter Stone, Jaegul Choo
Scaling up the model size and computation has brought consistent performance improvements in supervised learning. However, this lesson often fails to apply to reinforcement learning (RL) because training the model on non-stationary data easily leads to overfitting and unstab...Training Consistency Models with Variational Noise Coupling
Gianluigi Silvestri, Luca Ambrogioni, Chieh-Hsin Lai, Yuhta Takida, Yuki Mitsufuji
Consistency Training (CT) has recently emerged as a promising alternative to diffusion models, achieving competitive performance in image generation tasks. However, non-distillation consistency training often suffers from high variance and instability, and analyzing and impr...Supervised Contrastive Learning from Weakly-labeled Audio Segments for Musical Version Matching
Joan Serrà, R. Oguz Araz, Dmitry Bogdanov, Yuki Mitsufuji
Detecting musical versions (different renditions of the same piece) is a challenging task with important applications. Because of the ground truth nature, existing approaches match musical versions at the track level (e.g., whole song). However, most applications require to ...Distillation of Discrete Diffusion through Dimensional Correlations
Satoshi Hayakawa*, Yuhta Takida, Masaaki Imaizumi*, Hiromi Wakaki*, Yuki Mitsufuji
Diffusion models have demonstrated exceptional performances in various fields of generative modeling, but suffer from slow sampling speed due to their iterative nature. While this issue is being addressed in continuous domains, discrete diffusion models face unique challenge...Music Foundation Model as Generic Booster for Music Downstream Tasks
WeiHsiang Liao, Yuhta Takida, Yukara Ikemiya, Zhi Zhong*, Chieh-Hsin Lai, Giorgio Fabbro*, Kazuki Shimada, Keisuke Toyama*, Kinwai Cheuk, Marco A. Martínez-Ramírez, Shusuke Takahashi*, Stefan Uhlich*, Taketo Akama*, Woosung Choi, Yuichiro Koyama*, Yuki Mitsufuji
We demonstrate the efficacy of using intermediate representations from a single foundation model to enhance various music downstream tasks. We introduce SoniDo, a music foundation model (MFM) designed to extract hierarchical features from target music samples. By leveraging ...A Champion-level Vision-based Reinforcement Learning Agent for Competitive Racing in Gran Turismo 7
Hojoon Lee, Takuma Seno, Jun Jet Tai, Kaushik Subramanian, Kenta Kawamoto, Peter Stone, Peter R. Wurman
Deep reinforcement learning has achieved superhuman racing performance in high-fidelity simulators like Gran Turismo 7 (GT7). It typically utilizes global features that require instrumentation external to a car, such as precise localization of agents and opponents, limiting ...Improving Vector-Quantized Image Modeling with Latent Consistency-Matching Diffusion
Bac Nguyen, Chieh-Hsin Lai, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Yuki Mitsufuji
By embedding discrete representations into a continuous latent space, we can leverage continuous-space latent diffusion models to handle generative modeling of discrete data. However, despite their initial success, most latent diffusion methods rely on fixed pretrained embed...A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Join…
Masato Ishii, Akio Hayakawa, Takashi Shibuya, Yuki Mitsufuji
In this work, we build a simple but strong baseline for sounding video generation. Given base diffusion models for audio and video, we integrate them with additional modules into a single model and train it to make the model jointly generate audio and video. To enhance align...MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training
Kengo Uchida, Takashi Shibuya, Yuhta Takida, Naoki Murata, Julian Tanke, Shusuke Takahashi*, Yuki Mitsufuji
In text-to-motion generation, controllability as well as generation quality and speed has become increasingly critical. The controllability challenges include generating a motion of a length that matches the given textual description and editing the generated motions accordi...Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models
Jie Ren, Kangrui Chen, Yingqian Cui, Shenglai Zeng, Hui Liu, Yue Xing, Jiliang Tang, Lingjuan Lyu
Text-to-image (T2I) diffusion models have shown exceptional capabilities in generating images that closely correspond to textual prompts. However, the advancement of T2I diffusion models presents significant risks, as the models could be exploited for malicious purposes, suc...CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI
Siyuan Cheng, Lingjuan Lyu, Zhenting Wang, Xiangyu Zhang, Vikash Sehwag
With the rapid advancement of generative AI, it is now pos-sible to synthesize high-quality images in a few seconds.Despite the power of these technologies, they raise signif-icant concerns regarding misuse. Current efforts to dis-tinguish between real and AI-generated image...Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
Vikash Sehwag, Xianghao Kong, Jingtao Li, Michael Spranger, Lingjuan Lyu
As scaling laws in generative AI push performance, they simultaneously concentrate the development of these models among actors with large computational resources. With a focus on text-to-image (T2I) generative models, we aim to unlock this bottleneck by demonstrating very l...Argus: A Compact and Versatile Foundation Model for Vision
Weiming Zhuang, Chen Chen, Zhizhong Li, Sina Sajadmanesh, Jingtao Li, Jiabo Huang, Vikash Sehwag, Vivek Sharma, Hirotaka Shinozaki, Felan Carlo Garcia, Yihao Zhan, Naohiro Adachi, Ryoji Eki, Michael Spranger, Peter Stone, Lingjuan Lyu
While existing vision and multi-modal foundation models can handle multiple computer vision tasks, they often suffer from significant limitations, including huge demand for data and computational resources during training and inconsistent performance across vision tasks at d...ReRAW: RGB-to-RAW Image Reconstruction via Stratified Sampling for Efficient Object Detection on the Edge
Radu Berdan, Beril Besbinar, Christoph Reinders, Junji Otsuka*, Daisuke Iso
Edge-based computer vision models running on compact, resource-limited devices benefit greatly from using unprocessed, detail-rich RAW sensor data instead of processed RGB images. Training these models, however, necessitates large labeled RAW datasets, which are costly and o...Noise Modeling in One Hour: Minimizing Preparation Efforts for Self-supervised Low-Light RAW Image Denoising
Feiran Li, Haiyang Jiang, Daisuke Iso
Noise synthesis is a promising solution for addressing the data shortage problem in data-driven low-light RAW image denoising. However, accurate noise synthesis methods often necessitate labor-intensive calibration and profiling procedures during preparation, preventing them...VinaBench: Benchmark for Faithful and Consistent Visual Narratives
Silin Gao*, Sheryl Mathew, Li Mi, Sepideh Mamooler, Mengjie Zhao*, Hiromi Wakaki*, Yuki Mitsufuji, Syrielle Montariol, Antoine Bosselut*
Visual narrative generation transforms textual narratives into sequences of images illustrating the content of the text. However, generating visual narratives that are faithful to the input text and self-consistent across generated images remains an open challenge, due to th...LLM-BRec: Personalizing Session-based Social Recommendation with LLM-BERT Fusion Framework
Raksha Jalan, Tushar Prakash, Niranjan Pedanekar*
Recommendation models enhance online user engagement by suggesting personalized content, boosting satisfaction and retention. Session-based Recommender systems (SR) have become a significant approach, focusing on capturing users' short-term preferences for more accurate reco...Faster Machine Translation Ensembling with Reinforcement Learning and Competitive Correction
Kritarth Prasad, Mohammadi Zaki, Pratik Singh, Pankaj Wasnik
Ensembling neural machine translation (NMT) models to produce higher-quality translations than the $L$ individual models has been extensively studied. Recent methods typically employ a candidate selection block (CSB) and an encoder-decoder fusion block (FB), requiring infere...Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space
Yangming Li, Chieh-Hsin Lai, Carola-Bibiane Schönlieb, Yuki Mitsufuji, Stefano Ermon*
Deep Generative Models (DGMs), including Energy-Based Models (EBMs) and Score-based Generative Models (SGMs), have advanced high-fidelity data generation and complex continuous distribution approximation. However, their application in Markov Decision Processes (MDPs), partic...Classifier-Free Guidance inside the Attraction Basin May Cause Memorization
Anubhav Jain, Yuya Kobayashi, Takashi Shibuya, Yuhta Takida, Nasir Memon, Julian Togelius, Yuki Mitsufuji
Diffusion models are prone to exactly reproduce images from the training data. This exact reproduction of the training data is concerning as it can lead to copyright infringement and/or leakage of privacy-sensitive information. In this paper, we present a novel way to unders...MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Ho Kei Cheng, Masato Ishii, Akio Hayakawa, Takashi Shibuya, Alexander Schwing, Yuki Mitsufuji
We propose to synthesize high-quality and synchronized audio, given video and optional text conditions, using a novel multimodal joint training framework MMAudio. In contrast to single-modality training conditioned on (limited) video data only, MMAudio is jointly trained wit...Transformative Movie Discovery: Large Language Models for Recommendation and Genre Prediction
Shubham Raj, Anurag Sharma, Sriparna Saha*, Brijraj Singh*, Niranjan Pedanekar*
In the era of digital streaming platforms, personalized movie recommendations, and genre prediction have become pivotal for enhancing user engagement and satisfaction. With the growing number of OTT (Over-The-Top) platforms like Netflix, Amazon Prime Video, and Disney+, the ...Efficacy of Large Language Models in Predicting Hindi Movies' Attributes: A Comprehensive Survey and Content-Based Analysis
Prabir Mondal*, Siddharth Singh*, Kushum*, Sriparna Saha*, Jyoti Prakash Singh*, Brijraj Singh*, Niranjan Pedanekar*
This research explores the efficacy of four state-of-the-art Large Language Models (LLMs): GPT-3.5-turbo-0301, Vicuna, PaLM 2, and Dolly in predicting (i) movie genres using audio transcripts of movie trailers and (ii) meta-information such as director and cast details using...Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Model
Jie Ren, Kangrui Chen, Chen Chen, Vikash Sehwag, Yue Xing, Jiliang Tang, Lingjuan Lyu
Large Language Models (LLMs) and Vision-Language Models (VLMs) have made significant advancements in a wide range of natural language processing and vision-language tasks. Access to large web-scale datasets has been a key factor in their success. However, concerns have been ...Cross-Modal Fusion and Attention Mechanism for Weakly Supervised Video Anomaly Detection
Ayush Ghadiya, Purbayan Kar, Vishal Chudasama, Pankaj Wasnik
Recently, weakly supervised video anomaly detection (WS-VAD) has emerged as a contemporary research direction to identify anomaly events like violence and nudity in videos using only video-level labels. However, this task has substantial challenges, including addressing imba...VRVQ: Variable Bitrate Residual Vector Quantization for Audio Compression
Yunkee Chae, Woosung Choi, Yuhta Takida, Junghyun Koo, Yukara Ikemiya, Zhi Zhong*, Kin Wai Cheuk, Marco A. Martínez-Ramírez, Kyogu Lee*, Wei-Hsiang Liao, Yuki Mitsufuji
Recent state-of-the-art neural audio compression models have progressively adopted residual vector quantization (RVQ). Despite this success, these models employ a fixed number of codebooks per frame, which can be suboptimal in terms of rate-distortion tradeoff, particularly ...Latent Diffusion Bridges for Unsupervised Musical Audio Timbre Transfer
Michele Mancusi, Yurii Halychanskyi, Kin Wai Cheuk, Eloi Moliner, Chieh-Hsin Lai, Stefan Uhlich*, Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Giorgio Fabbro*, Yuki Mitsufuji
Music timbre transfer is a challenging task that involves modifying the timbral characteristics of an audio signal while preserving its melodic structure. In this paper, we propose a novel method based on dual diffusion bridges, trained using the CocoChorales Dataset, which ...30+ Years of Source Separation Research: Achievements and Future Challenges
Shoko Araki, Nobutaka Ito, Reinhold Haeb-Umbach, Gordon Wichern, Zhong-Qiu Wang, Yuki Mitsufuji
Source separation (SS) of acoustic signals is a research field that emerged in the mid-1990s and has flourished ever since. On the occasion of ICASSP's 50th anniversary, we review the major contributions and advancements in the past three decades in the speech, audio, and mu...Exploit Gradient Skewness to Circumvent Byzantine Defenses for Federated Learning
Yuchen Liu*, Chen Chen, Lingjuan Lyu, Yaochu Jin, Gang Chen*
Federated Learning (FL) is notorious for its vulnerability to Byzantine attacks. Most current Byzantine defenses share a common inductive bias: among all the gradients, the densely distributed ones are more likely to be honest. However, such a bias is a poison to Byzantine r...Identifying Candidates for Protein-Protein Interaction: A Focus on NKp46’s Ligands
Alessia Borghini, Federico Di Valerio, Alessio Ragno*, Roberto Capobianco
Recent advances in protein-protein interaction (PPI) research have harnessed the power of artificialintelligence (AI) to enhance our understanding of protein behaviour. These approaches have becomeindispensable tools in the field of biology and medicine, enabling scientists ...Neural Reward Machines
Elena Umili*, Francesco Argenziano*, Roberto Capobianco
Non-markovian Reinforcement Learning (RL) tasks arevery hard to solve, because agents must consider the entire history ofstate-action pairs to act rationally in the environment. Most works usesymbolic formalisms (as Linear Temporal Logic or automata) to specify the temporall...Transparent Explainable Logic Layers
Alessio Ragno*, Marc Plantevit, Celine Robardet, Roberto Capobianco
Explainable AI seeks to unveil the intricacies of black box models through post-hoc strategies or self-interpretable models. In this paper, we tackle the problem of building layers that are intrinsically explainable through logical rules. In particular, we address current st...AIM 2024 Challenge on UHD Blind Photo Quality Assessment
Vlad Hosu, Marcos V. Conde, Lorenzo Agnolucci, Nabajeet Barman, Saman Zadtootaghaj, Radu Timofte
We introduce the AIM 2024 UHD-IQA Challenge, a competition to advance the No-Reference Image Quality Assessment (NR-IQA) task for modern, high-resolution photos. The challenge is based on the recently released UHD-IQA Benchmark Database, which comprises 6,073 UHD-1 (4K) imag...UHD-IQA Benchmark Database: Pushing the Boundaries of Blind Photo Quality Assessment
Vlad Hosu, Lorenzo Agnolucci, Oliver Wiedemann, Daisuke Iso, Dietmar Saupe*
We introduce a novel Image Quality Assessment (IQA) dataset comprising 6073 UHD-1 (4K) images, annotated at a fixed width of 3840 pixels. Contrary to existing No-Reference (NR) IQA datasets, ours focuses on highly aesthetic photos of high technical quality, filling a gap in ...Disentangling Mixtures of Musical Instruments for Source-level Pitch and Timbre Manipulation
Yin-Jyun Luo, Kin Wai Cheuk, Woosung Choi, Toshimitsu Uesaka, Keisuke Toyama*, Koichi Saito, Chieh-Hsin Lai, Yuhta Takida, Wei-Hsiang Liao, Simon Dixon, Yuki Mitsufuji
Existing work on pitch and timbre disentanglement has been mostly focused on single-instrument music audio, excluding the cases where multiple instruments are presented. To fill the gap, we propose DisMix, a generative framework in which the pitch and timbre representations ...SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation
Koichi Saito, Dongjun Kim*, Takashi Shibuya, Chieh-Hsin Lai, Zhi Zhong*, Yuhta Takida, Yuki Mitsufuji
Sound content is an indispensable element for multimedia works such as video games, music, and films. Recent high-quality diffusion-based sound generation models can serve as valuable tools for the creators. However, despite producing high-quality sounds, these models often ...LOCKEY: A Novel Approach to Model Authentication and Deepfake Tracking
Mayank Kumar Singh*, Naoya Takahashi, Wei-Hsiang Liao, Yuki Mitsufuji
This paper presents a novel approach to deter unauthorized deepfakes and enable user tracking in generative models, even when the user has full access to the model parameters, by integrating key-based model authentication with watermarking techniques. Our method involves pro...RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image Generation
Christoph Reinders, Radu Berdan, Beril Besbinar, Junji Otsuka*, Daisuke Iso
Current deep learning approaches in computer vision primarily focus on RGB data sacrificing information. In contrast, RAW images offer richer representation, which is crucial for precise recognition, particularly in challenging conditions like low-light environments. The res...SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning
Hojoon Lee, Dongyoon Hwang, Donghu Kim, Hyunseung Kim, Jun Jet Tai, Kaushik Subramanian, Peter R. Wurman, Jaegul Choo, Peter Stone, Takuma Seno
Recent advances in CV and NLP have been largely driven by scaling up the number of network parameters, despite traditional theories suggesting that larger networks are prone to overfitting. These large networks avoid overfitting by integrating components that induce a simpli...Residual-MPPI: Online Policy Customization for Continuous Control
Pengcheng Wang, Chenran Li, Catherine Weaver*, Kenta Kawamoto, Masayoshi Tomizuka*, Chen Tang*, Wei Zhan*
Policies learned through Reinforcement Learning (RL) and ImitationLearning (IL) have demonstrated significant potential in achieving advanced performance in continuous control tasks. However, in real-world environments, itis often necessary to further customize a trained pol...Weighted Point Cloud Embedding for Multimodal Contrastive Learning Toward Optimal Similarity Metric
Toshimitsu Uesaka, Taiji Suzuki, Yuhta Takida, Chieh-Hsin Lai, Naoki Murata, Yuki Mitsufuji
In typical multimodal contrastive learning, such as CLIP, encoders produce onepoint in the latent representation space for each input. However, one-point representation has difficulty in capturing the relationship and the similarity structure of a huge amount of instances in...Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning
Shang-Fu Chen, Chieh-Hsin Lai, Dongjun Kim*, Naoki Murata, Takashi Shibuya, Wei-Hsiang Liao, Shao-Hua Sun, Yuki Mitsufuji, Ayano Hiranaka
Controllable generation through Stable Diffusion (SD) fine-tuning aims to improve fidelity, safety, and alignment with human guidance. Existing reinforcement learning from human feedback methods usually rely on predefined heuristic reward functions or pretrained reward model...Mining your own secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models
Saurav Jha, Shiqi Yang*, Masato Ishii, Mengjie Zhao*, Christian Simon, Muhammad Jehanzeb Mirza, Dong Gong, Lina Yao, Shusuke Takahashi*, Yuki Mitsufuji
Personalized text-to-image diffusion models have grown popular for their ability to efficiently acquire a new concept from user-defined text descriptions and a few images. However, in the real world, a user may wish to personalize a model on multiple concepts but one at a ti...Jump Your Steps: Optimizing Sampling Schedule of Discrete Diffusion Models
Yong-Hyun Park, Chieh-Hsin Lai, Satoshi Hayakawa*, Yuhta Takida, Yuki Mitsufuji
Diffusion models have seen notable success in continuous domains, leading to the development of discrete diffusion models (DDMs) for discrete variables. Despite recent advances, DDMs face the challenge of slow sampling speeds. While parallel sampling methods like -leaping ac...Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation
Akio Hayakawa, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji
In this study, we aim to construct an audio-video generative model with minimal computational cost by leveraging pre-trained single-modal generative models for audio and video. To achieve this, we propose a novel method that guides single-modal models to cooperatively genera...SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation
Koichi Saito, Dongjun Kim*, Takashi Shibuya, Chieh-Hsin Lai, Zhi Zhong*, Yuhta Takida, Yuki Mitsufuji
Sound content creation, essential for multimedia works such as video games and films, often involves extensive trial-and-error, enabling creators to semantically reflect their artistic ideas and inspirations, which evolve throughout the creation process, into the sound. Rece...Dobby: A Conversational Service Robot Driven by GPT-4
Carson Stark, Bohkyung Chun, Casey Charleston, Varsha Ravi, Luis Pabon, Surya Sunkari, Tarun Mohan, Peter Stone, Justin Hart*
This work introduces a robotics platform which embeds a conversational AI agent in an embodied system for natural language understanding and intelligent decision-making for service tasks; integrating task planning and human-like conversation. The agent is derived from a larg...Disentangled Unsupervised Skill Discovery for Efficient Hierarchical Reinforcement Learning
Jiaheng Hu*, Roberto Martin-Martin*, Peter Stone, Zizhao Wang*
A hallmark of intelligent agents is the ability to learn reusable skills purely from unsupervised interaction with the environment. However, existing unsupervised skill discovery methods often learn entangled skills where one skill variable simultaneously influences many ent...SkiLD: Unsupervised Skill Discovery Guided by Factor Interactions
Zizhao Wang*, Jiaheng Hu*, Roberto Martin-Martin*, Amy Zhang*, Scott Niekum*, Peter Stone, Caleb Chuck, Stephen Chen
Unsupervised skill discovery carries the promise that an intelligent agent can learn reusable skills through autonomous, reward-free environment interaction. Existing unsupervised skill discovery methods learn skills by encouraging distinguishable behaviors that cover divers...N-Agent Ad Hoc Teamwork
Caroline Wang*, Arrasy Rahman*, Ishan Durugkar, Elad Liebman*, Peter Stone
Current approaches to learning cooperative multi-agent behaviors assume relatively restrictive settings. In standard fully cooperative multi-agent reinforcement learning, the learning algorithm controls all agents in the scenario, while in ad hoc teamwork, the learning algor...Learning to Look: Seeking Information for Decision Making via Policy Factorization
Jiaheng Hu*, Peter Stone, Roberto Martin-Martin*, Ben Abbatematteo, Shivin Dass
Many robot manipulation tasks require active or interactive exploration behavior in order to be performed successfully. Such tasks are ubiquitous in embodied domains, where agents must actively search for the information necessary for each stage of a task, e.g., moving the h...LaRS: Latent Reasoning Skills for Chain-of-Thought Reasoning
Zifan Xu*, Peter Stone, Dmitriy Bespalov, Haozhu Wang, Xian Wu, Yanjun Qi
Chain-of-thought (CoT) prompting is a popularin-context learning (ICL) approach for large language models (LLMs), especially when tackling complex reasoning tasks. Traditional ICL approaches construct prompts using examples that contain questions similar to the input questio...Open-Set Object Detection By Aligning Known Class Representations
Vishal Chudasama, Naoyuki Onoe*, Pankaj Wasnik, Hiran Sarkar, Vineeth N Balasubramanian
Open-Set Object Detection (OSOD) has emerged as a contemporary research direction to address the detection of unknown objects. Recently, few works have achieved remarkable performance in the OSOD task by employing contrastive clustering to separate unknown classes. In contra...Enhancing Whisper's Accuracy and Speed for Indian Languages through Prompt-Tuning and Tokenization
Pankaj Wasnik, Kumud Tripathi, Raj Gothi
Automatic speech recognition has recently seen a significant advancement with large foundational models such as Whisper. However, these models often struggle to perform well in low-resource languages, such as Indian languages. This paper explores two novel approaches to enha...EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion
Ashishkumar Gudmalwar, Nirmesh Shah*, Pankaj Wasnik, Ishan Biyani, Rajiv R. Shah
The Emotional Voice Conversion (EVC) aims to convert the discrete emotional state from the source emotion to the target for a given speech utterance while preserving linguistic content. In this paper, we propose regularizing emotion intensity in the diffusion-based EVC frame...Enhancing Entertainment Translation for Indian Languages using Adaptive Context, Style and LLMs
Mohammadi Zaki, Pankaj Wasnik, Pratik Rakesh Singh
We address the challenging task of neural machine translation (NMT) in the entertainment domain, where the objective is to automatically translate a given dialogue from a source language content to a target language. This task has various applications, particularly in automa...DeepDFA: Automata Learning through Neural Probabilistic Relaxations
Elena Umili*, Roberto Capobianco
In this work, we introduce DeepDFA, a novel approach to identifying Deterministic Finite Automata (DFAs) from traces, harnessing a differentiable yet discrete model. Inspired by both the probabilistic relaxation of DFAs and Recurrent Neural Networks (RNNs), our model offers ...Discogs-VINet-MIREX
Xavier Serra, Yuki Mitsufuji, R.O. Araz, J. Serrà, D. Bogdanov
This technical report presents our submission to the cover song identification task for the 2024 edition of the Music Information Retrieval Evaluation eXchange (MIREX). For this submission, we enhanced our Discogs-VINet model by changing the definition of an epoch, incorpora...N-agent Ad Hoc Teamwork
Caroline Wang*, Arrasy Rahman*, Ishan Durugkar, Elad Liebman*, Peter Stone
Current approaches to learning cooperative multi-agent behaviors assume relatively restrictive settings. In standard fully cooperative multi-agent reinforcement learning, the learning algorithm controls all agents in the scenario, while in ad hoc teamwork, the learning algor...Towards Exact Gradient-based Training on Analog In-memory Computing
Zhaoxian Wu*, Tayfun Gokmen*, Malte J. Rasch, Tianyi Chen*
Analog in-memory accelerators present a promising solution for energy-efficient training and inference of large vision or language models. While the inference on analog accelerators has been studied recently, the analog training perspective is under-explored. Recent studies ...PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher
Dongjun Kim*, Chieh-Hsin Lai, Wei-Hsiang Liao, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon*
To accelerate sampling, diffusion models (DMs) are often distilled into generators that directly map noise to data in a single step. In this approach, the resolution of the generator is fundamentally limited by that of the teacher DM. To overcome this limitation, we propose ...GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping
Junyoung Seo*, Kazumi Fukuda, Takashi Shibuya, Takuya Narihira, Naoki Murata, Shoukang Hu, Chieh-Hsin Lai, Seungryong Kim*, Yuki Mitsufuji
Generating novel views from a single image remains a challenging task due to the complexity of 3D scenes and the limited diversity in the existing multi-view datasets to train a model on. Recent research combining large-scale text-to-image (T2I) models with monocular depth e...A Taxonomy of Challenges to Curating Fair Datasets
Dora Zhao*, Morgan Klaus Scheuerman, Pooja Chitre*, Jerone Andrews, Georgia Panagiotidou*, Shawn Walker*, Kathleen H. Pine*, Alice Xiang
Despite extensive efforts to create fairer machine learning (ML) datasets, there remains a limited understanding of the practical aspects of dataset curation. Drawing from interviews with 30 ML dataset curators, we present a comprehensive taxonomy of the challenges and trade...Policy composition via multi-objective reinforcement learning
Shruti Mishra, Ankit Anand, Jordan Hoffmann, Nicolas Heess, Martin Riedmiller, Abbas Abdolmaleki, Doina Precup
We enable reinforcement learning agents to learn successful behavior policies by utilizing relevant pre-existing teacher policies. The teacher policies are introduced as objectives, in addition to the task objective, in a multi-objective policy optimization setting. Using th...FLoRA: Federated Fine-Tuning Large Language Models with Heterogeneous Low- Rank Adaptations
Lingjuan Lyu, Ziyao Wang, Zheyu Shen, Yexiao He, Guoheng Sun, Hongyi Wang, Ang Li
The rapid development of Large Language Models (LLMs) has been pivotal in advancing AI, with pre-trained LLMs being adaptable to diverse downstream tasks through fine-tuning. Federated learning (FL) further enhances fine-tuning in a privacy-aware manner by utilizing clients'...pFedClub: Controllable Heterogeneous Model Aggregation for Personalized Federated Learning
Jiaqi Wang*, Lingjuan Lyu, Fenglong Ma*, Qi Li
Federated learning, a pioneering paradigm, enables collaborative model training without exposing users’ data to central servers. Most existing federated learning systems necessitate uniform model structures across all clients, restricting their practicality. Several methods ...CURE4Rec: A Benchmark for Recommendation Unlearning with Deeper Influence
Chaochao Chen*, Yizhao Zhang*, Lingjuan Lyu, Yuyuan Li*, Jiaming Zhang, Li Zhang, Biao Gong, Chenggang Yan
With increasing privacy concerns in artificial intelligence, regulations have mandated the right to be forgotten, granting individuals the right to withdraw their data from models. Machine unlearning has emerged as a potential solution to enable selective forgetting in model...FEDMEKI: A Benchmark for Scaling Medical Foundation Models via Federated Knowledge Injection
Jiaqi Wang*, Lingjuan Lyu, Fenglong Ma*, Xiaochen Wang, Jinghui Chen
This study introduces the Federated Medical Knowledge Injection (FedMEKI) platform, a new benchmark designed to address the unique challenges of integrating medical knowledge into foundation models under privacy constraints. By leveraging a cross-silo federated learning appr...DECO-Bench: Unified Benchmark for Decoupled Task-Agnostic Synthetic Data Release
Lingjuan Lyu, Vivek Sharma, Farzaneh Askari
In this work, we tackle the question of how to systematically benchmark task-agnostic decoupling methods for privacy-preserving machine learning (ML). Sharing datasets that include sensitive information often triggers privacy concerns, necessitating robust decoupling methods...SIMBA: Split Inference - Mechanisms, Benchmarks and Attacks
Abhishek Singh*, Vivek Sharma, Ramesh Raskar*, Rohan Sukumaran, John Mose, Jeffrey Chiu, Justin Yu
In this work, we tackle the question of how to benchmark reconstruction of inputs from deep neural networks (DNN) representations. This inverse problem is of great importance in the privacy community where obfuscation of features has been proposed as a technique for privacy-...Masked Differential Privacy
Sina Sajadmanesh, Vikash Sehwag, Lingjuan Lyu, Vivek Sharma, David Schneider, Saquib Sarfraz, Rainer Stiefelhagen
Privacy-preserving computer vision is an important emerg- ing problem in machine learning and artificial intelligence. The prevalent methods tackling this problem use differential privacy or anonymization and obfuscation techniques to protect the privacy of individuals. In b...Prosody as an Informative Teaching Signal for Agent Learning: Exploratory Studies and Algorithmic Implications
Akanksha Saran, Matilda Knierim, Sahil Jain, Murat Han Aydoğan, Kenneth Mitra, Kush Desai, Kim Baraka*
Agent learning from human interaction often relies on explicit signals, but implicit social cues, such as prosody in speech, could provide valuable information for more effective learning. This paperadvocates for the integration of prosody as a teaching signal to enhance age...Images Speak Louder than Words: Understanding and Mitigating Bias in Vision-Language Model from a Causal Mediation Perspectiv…
Zhaotian Weng*, Zijun Gao*, Jerone Andrews, Jieyu Zhao*
Vision-language models (VLMs) pre-trained on extensive datasets can inadvertently learn biases by correlating gender information with specific objects or scenarios. Current methods, which focus on modifying inputs and monitoring changes in the model's output probability scor...Resampled Datasets Are Not Enough: Mitigating Societal Bias Beyond Single Attributes
Yusuke Hirota, Jerone Andrews, Dora Zhao*, Orestis Papakyriakopoulos*, Apostolos Modas, Yuta Nakashima*, Alice Xiang
We tackle societal bias in image-text datasets by removing spurious correlations between protected groups and image attributes. Traditional methods only target labeled attributes, ignoring biases from unlabeled ones. Using text-guided inpainting models, our approach ensures ...Discovering Creative Behaviors through DUPLEX: Diverse Universal Features for Policy Exploration
Borja G. Leon*, Francesco Riccio, Kaushik Subramanian, Pete Wurman, Peter Stone
The ability to approach the same problem from different angles is a cornerstone of human intelligence that leads to robust solutions and effective adaptation to problem variations. In contrast, current RL methodologies tend to lead to policies that settle on a single solutio...A Simple Background Augmentation Method for Object Detection with Diffusion Model
Yuhang Li, Xin Dong, Chen Chen, Weiming Zhuang, Lingjuan Lyu
In computer vision, it is well-known that a lack of data diversity will impair model performance. In this study, we address the challenges of enhancing the dataset diversity problem in order to benefit various downstream tasks such as object detection and instance segmentati...Efficient Bias Mitigation Without Privileged Information
Mateo Espinosa Zarlenga*, Swami Sankaranarayanan, Jerone Andrews, Zohreh Shams, Mateja Jamnik*, Alice Xiang
Deep neural networks trained via empirical risk minimisation often exhibit significant performance disparities across groups, particularly when group and task labels are spuriously correlated (e.g., “grassy background” and “cows”). Existing bias mitigation methods that aim t...Finding a needle in a haystack: A Black-Box Approach to Invisible Watermark Detection
Minzhou Pan*, Zhenting Wang, Xin Dong, Vikash Sehwag, Lingjuan Lyu, Xue Lin*
In this paper, we propose WaterMark Detection (WMD), the first invisible watermark detection method under a black-box and annotation-free setting. WMD is capable of detecting arbitrary watermarks within a given reference dataset using a clean non watermarked dataset as a ref...Fast and robust analog in-memory deep neural network training
Malte J. Rasch, Fabio Carta*, Omobayode Fagbohungbe*, Tayfun Gokmen*
Analog in-memory computing is a promising future technology for efficiently accelerating deep learning networks. While using in-memory computing to accelerate the inference phase has been studied extensively, accelerating the training phase has received less attention, despi...Revisiting named entity recognition in food computing: enhancing performance and robustness
Uchenna Akujuobi, Shuhong Liu*, Tarek R Besold
In the ever-evolving domain of food computing, named entity recognition (NER) presents transformative potential that extends far beyond mere word tagging in recipes. Its implications encompass intelligent recipe recommendations, health analysis, and personalization. Neverthe...Link prediction for hypothesis generation: an active curriculum learning infused temporal graph-based approach
Uchenna Akujuobi, Priyadarshini Kumari, Jihun Choi, Samy Badreddine, Kana Maruyama, Sucheendra K Palaniappan*, Tarek R Besold
Over the last few years Literature-based Discovery (LBD) has regained popularity as a means to enhance the scientific research process. The resurgent interest has spurred the development of supervised and semi-supervised machine learning models aimed at making previously imp...It is Simple Sometimes: A Study On Improving Aspect-Based Sentiment Analysis Performance
Laura Cabello*, Uchenna Akujuobi
Aspect-Based Sentiment Analysis (ABSA) involves extracting opinions from textual data about specific entities and their corresponding aspects through various complementary subtasks. Several prior research has focused on developing ad hoc designs of varying complexities for t...Analysis of Multi-Source Language Training in Cross-Lingual Transfer
Seong Hoon Lim*, Taejun Yun*, Jinhyeon Kim*, Jihun Choi, Taeuk Kim
The successful adaptation of multilingual language models (LMs) to a specific language-task pair critically depends on the availability of data tailored for that condition. While cross-lingual transfer (XLT) methods have contributed to addressing this data scarcity problem, ...A Super-human Vision-based Reinforcement Learning Agent for Autonomous Racing in Gran Turismo
Miguel Vasco*, Takuma Seno, Kenta Kawamoto, Kaushik Subramanian, Pete Wurman, Peter Stone
Racing autonomous cars faster than the best human drivers has been a longstanding grand challenge for the fields of Artificial Intelligence and robotics. Recently, an end-to-end deep reinforcement learning agent met this challenge in a high-fidelity racing simulator, Gran Tu...The whole is greater than the sum of its parts: improving music source separation by bridging networks
Ryosuke Sawata, Naoya Takahashi, Stefan Uhlich*, Shusuke Takahashi*, Yuki Mitsufuji
This paper presents the crossing scheme (X-scheme) for improving the performance of deep neural network (DNN)-based music source separation (MSS) with almost no increasing calculation cost. It consists of three components: (i) multi-domain loss (MDL), (ii) bridging operation...Measure dataset diversity, don’t just claim it
Dora Zhao*, Jerone T. A. Andrews, Orestis Papakyriakopoulos*, Alice Xiang
Machine learning (ML) datasets, often perceived as neutral, inherently encapsulate abstract and disputed social constructs. Dataset curators frequently employ value-laden terms such as diversity, bias, and quality to characterize datasets. Despite their prevalence, these ter...PerceptAnon: Exploring the Human Perception of Image Anonymization Beyond Pseudonymization for GDPR
Kartik Patwari, Chen-Nee Chuah*, Lingjuan Lyu, Vivek Sharma
Current image anonymization techniques, largely focus on localized pseudonymization, typically modify identifiable features like faces or full bodies and evaluate anonymity through metrics such as detection and re-identification rates. However, this approach often overlooks ...COALA: A Practical and Vision-Centric Federated Learning Platform
Weiming Zhuang, Jian Xu, Chen Chen, Jingtao Li, Lingjuan Lyu
We present COALA, a vision-centric Federated Learning (FL) platform, and a suite of benchmarks for practical FL scenarios, which we categorize as task, data, and model levels. At the task level, COALA extends support from simple classification to 15 computer vision tasks, in...Sparo: Selective Attention for Robust and Compositional Transformer Encodings for Vision
Ankit Vani*, Bac Nguyen, Samuel Lavoie*, Ranjay Krishna*, Aaron Courville*
Selective attention helps us focus on task-relevant aspects in the constant flood of our sensory input. This constraint in our perception allows us to robustly generalize under distractions and to new compositions of perceivable concepts. Transformers employ a similar notion...SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning
Bac Nguyen, Stefan Uhlich*, Fabien Cardinaux*, Lukas Mauch*, Marzieh Edraki*, Aaron Courville*
Handling distribution shifts from training data, known as out-of-distribution (OOD) generalization, poses a significant challenge in the field of machine learning. While a pre-trained vision-language model like CLIP has demonstrated remarkable zero-shot performance, further ...Analog AI as a Service: A Cloud Platform for In-Memory Computing
Kaoutar El Maghraouir*, Kim Tran*, Kurtis Ruby*, Borja Godoy*, Jordan Murray*, Manuel Le Gallo-Bourdeau*, Todd Deshane*, Pablo Gonzalez*, Diego Moreda*, Hadjer Benmeziane*, Corey Liam Lammie*, Julian Büchel*, Malte J. Rasch, Abu Sebastian*, Vijay Narayanan*
This paper introduces the Analog AI Cloud Composer platform, a service that allows users to access Analog In-Memory Computing (AIMC) simulation and computing resources over the cloud. We introduce the concept of an Analog AI as a Service (AAaaS). AIMC offers a novel approach...On the Language Encoder of Contrastive Cross-modal Models
Mengjie Zhao*, Junya Ono*, Zhi Zhong*, Chieh-Hsin Lai, Yuhta Takida, Naoki Murata, Takashi Shibuya, Hiromi Wakaki*, Yuki Mitsufuji, Wei-Hsiang Liao
Contrastive cross-modal models such as CLIP and CLAP aid various vision-language (VL) and audio-language (AL) tasks. However, there has been limited investigation of and improvement in their language encoder, which is the central component of encoding natural language descri...DiffuCOMET: Contextual Commonsense Knowledge Diffusion
Silin Gao*, Mete Ismayilzada*, Mengjie Zhao*, Hiromi Wakaki*, Yuki Mitsufuji, Antoine Bosselut*
Inferring contextually-relevant and diverse commonsense to understand narratives remains challenging for knowledge models. In this work, we develop a series of knowledge models, DiffuCOMET, that leverage diffusion to learn to reconstruct the implicit semantic connections bet...SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond
Marco Comunità*, Zhi Zhong*, Akira Takahashi*, Shiqi Yang*, Mengjie Zhao*, Koichi Saito, Yukara Ikemiya, Takashi Shibuya, Shusuke Takahashi*, Yuki Mitsufuji
Recent advances in generative models that iteratively synthesize audio clips sparked great success to text-to-audio synthesis (TTA), but with the cost of slow synthesis speed and heavy computation. Although there have been attempts to accelerate the iterative procedure, high...Towards Assessing Data Replication in Music Generation with Music Similarity Metrics on Raw Audio
Roser Batlle-Roca*, Wei-Hsiang Liao, Xavier Serra, Yuki Mitsufuji, Emilia Gómez*
Recent advancements in music generation are raising multiple concerns about the implications of AI in creative music processes, current business models and impacts related to intellectual property management. A relevant challenge is the potential replication and plagiarism o...A New Linear Scaling Rule for Private Adaptive Hyperparameter Optimization
Ashwinee Panda*, Xinyu Tang*, Vikash Sehwag, Saeed Mahloujifar*, Prateek Mittal*
An open problem in differentially private deep learning is hyperparameter optimization (HPO). DP-SGD introduces new hyperparameters and complicates existing ones, forcing researchers to painstakingly tune hyperparameters with hundreds of trials, which in turn makes it imposs...How to Trace Latent Generative Model Generated Images without Artificial Watermark?
Zhenting Wang, Vikash Sehwag, Chen Chen, Lingjuan Lyu, Dimitris N. Metaxas*, Shiqing Ma*
Latent generative models (e.g., Stable Diffusion) have become more and more popular, but concerns have arisen regarding potential misuse related to images generated by these models. It is, therefore, necessary to analyze the origin of images by inferring if a particular imag...SilentCipher: Deep Audio Watermarking
Mayank Kumar Singh*, Naoya Takahashi, Yuki Mitsufuji, Wei-Hsiang Liao
In the realm of audio watermarking, it is challenging to simultaneously encode imperceptible messages while enhancing the message capacity and robustness. Although recent advancements in deep learning-based methods bolster the message capacity and robustness over traditional...BeTAIL: Behavior Transformer Adversarial Imitation Learning from Human Racing Gameplay
Catherine Weaver*, Chen Tang*, Ce Hao*, Kenta Kawamoto, Masayoshi Tomizuka*, Wei Zhan*
Autonomous racing poses a significant challenge for control, requiring planning minimum-time trajectories under uncertain dynamics and controlling vehicles at their handling limits. Current methods requiring hand-designed physical models or reward functions specific to each ...State-Independent Low Resistance Drift SiSbTe Phase Change Memory for Analog In-Memory Computing Applications
HY Cheng*, Zhi-Lun Liu*, Amlan Majumdar*, Alexander Grun*, Asit Ray*, Jeff Su*, Malte J. Rasch, Fabio Carta*, Lynne Gignac*, Christian Lavoie*, Cheng-Wei Cheng*, M Bright Sky*, HL Lung*
We developed a phase-change memory (PCM), with SiSbTe material, that showed state-independent resistance drift (v~0.04) at 65°C over the entire analog conductance range. We evaluated this PCM for In Memory Compute (IMC) applications simulating the performance of BERT model w...Isometric Neural Machine Translation using Phoneme Count Ratio Reward-based Reinforcement Learning
Shivam R Mhaskar, Nirmesh Shah*, Mohammadi Zaki, Ashishkumar Gudmalwar, Pankaj Wasnik
Traditional Automatic Video Dubbing (AVD) pipeline consists of three key modules, namely, Automatic Speech Recognition (ASR), Neural Machine Translation (NMT), and Text-to-Speech (TTS). Within AVD pipelines, isometric-NMT algorithms are employed to regulate the length of the...Optimizing Movie Selections: A Multi-Task, Multi-Modal Framework with Strategies for Missing Modality Challenges
Subham Raj*, Pawan Agrawal*, Sriparna Saha*, Brijraj Singh*, Niranjan Pedanekar*
Online recommendation systems have become a crucial feature of Over-the-Top (OTT) platforms, which provide streaming media content over the internet. OTT platforms, such as Netflix, Hulu, and Amazon Prime, use recommendation systems to suggest movies, TV shows, and other con...SEARCHING FOR MUSIC MIXING GRAPHS: A PRUNING APPROACH
Sungho Lee*, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Stefan Uhlich*, Giorgio Fabbro*, Kyogu Lee*, Yuki Mitsufuji
Music mixing is compositional -- experts combine multiple audio processors to achieve a cohesive mix from dry source tracks. We propose a method to reverse engineer this process from the input and output audio. First, we create a mixing console that applies all available pro...CookingSense: A Culinary Knowledgebase with Multidisciplinary Assertions
Donghee Choi*, Mogan Gim*, Donghyeon Park*, Mujeen Sung, Hyunjae Kim, Jaewoo Kang*, Jihun Choi
This paper introduces CookingSense, a descriptive collection of knowledge assertions in the culinary domain extracted from various sources, including web data, scientific papers, and recipes, from which knowledge covering a broad range of aspects is acquired. CookingSense is...BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network
Takashi Shibuya, Yuhta Takida, Yuki Mitsufuji
Generative adversarial network (GAN)-based vocoders have been intensively studied because they can synthesize high-fidelity audio waveforms faster than real-time. However, it has been reported that most GANs fail to obtain the optimal projection for discriminating between re...Improving the Accuracy of Analog-Based In-Memory Computing Accelerators Post-Training
Corey Lammie*, Athanasios Vasilopoulos*, Julian Büchel*, Giacomo Camposampiero*, Manuel Le Gallo*, Malte J. Rasch, Abu Sebastian*
Analog-Based In-Memory Computing (AIMC) inference accelerators can be used to efficiently execute Deep Neural Network (DNN) inference workloads. However, to mitigate accuracy losses, due to circuit and device non-idealities, Hardware-Aware (HWA) training methodologies must b...VECL-TTS: Voice identity and Emotional style aware Cross-Lingual TTS
Ashishkumar Gudmalwar, Nirmesh Shah*, Sai Akarsh, Pankaj Wasnik
Despite the significant advancements in Text-to-Speech (TTS) systems, their full utilization in automatic dubbing remains limited. This task necessitates the extraction of voice identity and emotional style from a reference speech in a source language and subsequently transf...DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing
Neha Sahipjohn, Ashishkumar Gudmalwar, Nirmesh Shah*, Pankaj Wasnik
Audio-visual alignment after dubbing is a challenging research problem. To this end, we propose a novel method, DubWise Multi-modal Large Language Model (LLM)-based Text-to-Speech (TTS), which can control the speech duration of synthesized speech in such a way that it aligns...Wait That Feels Familiar: Learning to Extrapolate Human Preferences for Preference-Aligned Path Planning.
Haresh Karnan*, Elvin Yang*, Garrett Warnell*, Joydeep Biswas*, Peter Stone
Autonomous mobility tasks such as lastmile delivery require reasoning about operator indicated preferences over terrains on which the robot should navigate to ensure both robot safety and mission success. However, coping with out of distribution data from novel terrains or a...Now, Later, and Lasting: 10 Priorities for AI Research, Policy, and Practice.
Eric Horvitz*, Vincent Conitzer*, Sheila McIlraith*, Peter Stone
Advances in artificial intelligence (AI) will transform many aspects of our lives and society, bringing immense opportunities but also posing significant risks and challenges. The next several decades may well be a turning point for humanity, comparable to the industrial rev...Learning Optimal Advantage from Preferences and Mistaking it for Reward.
W. Bradley Knox*, Sigurdur Orn Adalgeirsson*, Serena Booth*, Anca Dragan*, Peter Stone, Scott Niekum*
We consider algorithms for learning reward functions from human preferences over pairs of trajectory segments, as used in reinforcement learning from human feedback (RLHF). Most recent work assumes that human preferences are generated based only upon the reward accrued withi...Minimum Coverage Sets for Training Robust Ad Hoc Teamwork Agents.
Arrasy Rahman*, Jiaxun Cui*, Peter Stone
Robustly cooperating with unseen agents and human partners presents significant challenges due to the diverse cooperative conventions these partners may adopt. Existing Ad Hoc Teamwork (AHT) methods address this challenge by training an agent with a population of diverse tea...Rethinking Social Robot Navigation: Leveraging the Best of Two Worlds
Amir Hossain Raj*, Zichao Hu*, Haresh Karnan*, Rohan Chandra*, Amirreza Payandeh*, Luisa Mao*, Peter Stone, Joydeep Biswas*, Xuesu Xiao*
Empowering robots to navigate in a socially compliant manner is essential for the acceptance of robots moving in human-inhabited environments. Previously, roboticists have developed geometric navigation systems with decades of empirical validation to achieve safety and effic...The Human in the Loop: Perspectives and Challenges for RoboCup 2050.
Alessandra Rossi*, Maike Paetzel-Prüsmann*, Merel Keijsers*, Michael Anderson*, Susan Leigh Anderson*, Daniel Barry*, Jan Gutsche*, Justin Hart*, Luca Iocchi*, Ainse Kokkelmans*, Wouter Kuijpers*, Yun Liu*, Daniel Polani*, Caleb Roscon*, Marcus Scheunemann*, Peter Stone, Florian Vahl*, René van de Molengraft*, Oskar von Stryk*
Robotics researchers have been focusing on developing autonomous and human-like intelligent robots that are able to plan, navigate, manipulate objects, and interact with humans in both static and dynamic environments. These capabilities, however, are usually developed for di...HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes
Yuhta Takida, Yukara Ikemiya, Takashi Shibuya, Kazuki Shimada, Woosung Choi, Chieh-Hsin Lai, Naoki Murata, Toshimitsu Uesaka, Kengo Uchida, Yuki Mitsufuji, Wei-Hsiang Liao
Vector quantization (VQ) is a technique to deterministically learn features with discrete codebook representations. It is commonly performed with a variational autoencoding model, VQ-VAE, which can be further extended to hierarchical structures for making high-fidelity recon...Enhancing Semantic Communication with Deep Generative Models -- An ICASSP Special Session Overview
Eleonora Grassucci*, Yuki Mitsufuji, Ping Zhang*, Danilo Comminiello*
Generative adversarial network (GAN)-based vocoders have been intensively studied because they can synthesize high-fidelity audio waveforms faster than real-time. However, it has been reported that most GANs fail to obtain the optimal projection for discriminating between re...Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders
Hao Shi*, Kazuki Shimada, Masato Hirano*, Takashi Shibuya, Yuichiro Koyama*, Zhi Zhong*, Shusuke Takahashi*, Tatsuya Kawahara*, Yuki Mitsufuji
Diffusion-based speech enhancement (SE) has been investigated recently, but its decoding is very time-consuming. One solution is to initialize the decoding process with the enhanced feature estimated by a predictive SE system. However, this two-stage method ignores the compl...VRDMG: Vocal Restoration via Diffusion Posterior Sampling with Multiple Guidance
Carlos Hernandez-Olivan*, Koichi Saito, Naoki Murata, Chieh-Hsin Lai, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Yuki Mitsufuji
Restoring degraded music signals is essential to enhance audio quality for downstream music manipulation. Recent diffusion-based music restoration methods have demonstrated impressive performance, and among them, diffusion posterior sampling (DPS) stands out given its intrin...Zero- and Few-shot Sound Event Localization and Detection
Kazuki Shimada, Kengo Uchida, Yuichiro Koyama*, Takashi Shibuya, Shusuke Takahashi*, Yuki Mitsufuji, Tatsuya Kawahara*
Sound event localization and detection (SELD) systems estimate direction-of-arrival (DOA) and temporal activation for sets of target classes. Neural network (NN)-based SELD systems have performed well in various sets of target classes, but they only output the DOA and tempor...Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription
Frank Cwitkowitz*, Kin Wai Cheuk, Woosung Choi, Marco A. Martínez-Ramírez, Keisuke Toyama*, Wei-Hsiang Liao, Yuki Mitsufuji
In recent years, research on music transcription has focused mainly on architecture design and instrument-specific data acquisition. With the lack of availability of diverse datasets, progress is often limited to solo-instrument tasks such as piano transcription. Several wor...Real-time Trajectory Generation via Dynamic Movement Primitives for Autonomous Racing
Catherine Weaver*, Roberto Capobianco, Peter R. Wurman, Peter Stone, Masayoshi Tomizuka*
We employ sequences of high-order motion primitives for efficient online trajectory planning, enabling competitive racecar control even when the car deviates from an offline demonstration. Dynamic Movement Primitives (DMPs) utilize a target-driven non-linear differential equ...Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning
Ce Hao*, Catherine Weaver*, Chen Tang*, Kenta Kawamoto, Masayoshi Tomizuka*, Wei Zhan*
Hierarchical reinforcement learning (RL) can accelerate long-horizon decision-making by temporally abstracting a policy into multiple levels. Promising results in sparse reward environments have been seen with skills , i.e. sequences of primitive actions. Typically, a skill ...FedMef: Towards Memory-efficient Federated Dynamic Pruning
Hong Huang, Weiming Zhuang, Chen Chen, Lingjuan Lyu
Federated learning (FL) promotes decentralized training while prioritizing data confidentiality. However, its application on resource-constrained devices is challenging due to the high demand for computation and memory resources for training deep learning models. Neural netw...DIAGNOSIS: Detecting Unauthorized Data Usages in Text-to-image Diffusion Models
Zhenting Wang, Chen Chen, Lingjuan Lyu, Dimitris N. Metaxas*, Shiqing Ma*
Recent text-to-image diffusion models have shown surprising performance in generating high-quality images. However, concerns have arisen regarding the unauthorized data usage during the training or fine-tuning process. One example is when a model trainer collects a set of im...Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning
Zizhao Wang*, Caroline Wang*, Xuesu Xiao*, Yuke Zhu*, Peter Stone
Two desiderata of reinforcement learning (RL) algorithms are the ability to learn from relatively little experience and the ability to learn policies that generalize to a range of problem specifications. In factored state spaces, one approach towards achieving both goals is ...Minimum Coverage Sets for Training Robust Ad Hoc Teamwork Agents
Arrasy Rahman*, Jiaxun Cui*, Peter Stone
Robustly cooperating with unseen agents and human partners presents significant challenges due to the diverse cooperative conventions these partners may adopt. Existing Ad Hoc Teamwork (AHT) methods address this challenge by training an agent with a population of diverse tea...Learning Optimal Advantage from Preferences and Mistaking it for Reward
W. Bradley Knox*, Stephane Hatgis-Kessell*, Sigurdur Orn Adalgeirsson*, Serena Booth*, Anca Dragan*, Peter Stone, Scott Niekum*
We consider algorithms for learning reward functions from human preferences over pairs of trajectory segments---as used in reinforcement learning from human feedback (RLHF)---including those used to fine tune ChatGPT and other contemporary language models. Most recent work o...Combating Data Imbalances in Federated Semi-supervised Learning with Dual Regulators
Sikai Bai*, Shuaicheng Li*, Weiming Zhuang, Jie Zhang*, Kunlin Yang*, Jun Hou*, Shuai Yi*, Shuai Zhang*, Junyu Gao*
Federated learning has become a popular method to learn from decentralized heterogeneous data. Federated semi-supervised learning (FSSL) emerges to train models from a small fraction of labeled data due to label scarcity on decentralized clients. Existing FSSL methods assume...Collaborative Multi-Object Tracking with Conformal Uncertainty Propagation
Sanbao Su*, Songyang Han, Yiming Li*, Zhili Zhang*, Chen Feng*, Caiwen Ding*, Fei Miao*
Object detection and multiple object tracking (MOT) are essential components of self-driving systems. Accurate detection and uncertainty quantification are both critical for onboard modules, such as perception, prediction, and planning, to improve the safety and robustness o...What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?
Songyang Han, Sanbao Su*, Sihong He*, Shuo Han*, Haizhao Yang*, Shaofeng Zou*, Fei Miao*
Various methods for Multi-Agent Reinforcement Learning (MARL) have been developed with the assumption that agents' policies are based on accurate state information. However, policies learned through Deep Reinforcement Learning (DRL) are susceptible to adversarial state pertu...A Multi-Agent Reinforcement Learning Approach for Safe and Efficient Behavior Planning of Connected Autonomous Vehicles
Songyang Han, Shanglin Zhou*, Jiangwei Wang*, Lynn Pepin*, Caiwen Ding*, Jie Fu*, Fei Miao*
The recent advancements in wireless technology enable connected autonomous vehicles (CAVs) to gather information about their environment by vehicle-to-vehicle (V2V) communication. In this work, we design an information-sharing-based multi-agent reinforcement learning (MARL) ...Views Can Be Deceiving: Improved SSL Through Feature Space Augmentation
Kimia Hamidieh*, Haoran Zhang*, Swami Sankaranarayanan, Marzyeh Ghassemi*
Supervised learning methods have been found to exhibit inductive biases favoring simpler features. When such features are spuriously correlated with the label, this can result in suboptimal performance on minority subgroups. Despite the growing popularity of methods which le...FedWon: Triumphing Multi-domain Federated Learning Without Normalization
Weiming Zhuang, Lingjuan Lyu
Federated learning (FL) enhances data privacy with collaborative in-situ training on decentralized clients. Nevertheless, FL encounters challenges due to non-independent and identically distributed (non-i.i.d) data, leading to potential performance degradation and hindered c...Detecting, Explaining, and Mitigating Memorization in Diffusion Models
Yuxin Wen, Yuchen Liu*, Chen Chen, Lingjuan Lyu
Recent breakthroughs in diffusion models have exhibited exceptional image-generation capabilities. However, studies show that some outputs are merely replications of training data. Such replications present potential legal challenges for model owners, especially when the gen...FedP3: Federated Personalized and Privacy-friendly Network Pruning under Model Heterogeneity
Kai Yi, Nidham Gazagnadou, Peter Richtárik*, Lingjuan Lyu
The interest in federated learning has surged in recent research due to its unique ability to train a global model using privacy-secured information held locally on each client. This paper pays particular attention to the issue of client-side model heterogeneity, a pervasive...Towards Principled Representation Learning from Videos for Reinforcement Learning
Dipendra Misra*, Akanksha Saran, Tengyang Xie*, Alex Lamb*, John Langford*
We study pre-training representations for decision-making using video data, which is abundantly available for tasks such as game agents and software testing. Even though significant empirical advances have been made on this problem, a theoretical understanding remains absent...SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer
Yuhta Takida, Masaaki Imaizumi*, Takashi Shibuya, Chieh-Hsin Lai, Toshimitsu Uesaka, Naoki Murata, Yuki Mitsufuji
Generative adversarial networks (GANs) learn a target probability distribution by optimizing a generator and a discriminator with minimax objectives. This paper addresses the question of whether such optimization actually provides the generator with gradients that make its d...Manifold Preserving Guided Diffusion
Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun Kim*, Wei-Hsiang Liao, Yuki Mitsufuji, J. Zico Kolter*, Ruslan Salakhutdinov*, Stefano Ermon*
Despite the recent advancements, conditional image generation still faces challenges of cost, generalizability, and the need for task-specific training. In this paper, we propose Manifold Preserving Guided Diffusion (MPGD), a training-free conditional generation framework th...Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion
Dongjun Kim*, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, Stefano Ermon*
Consistency Models (CM) (Song et al., 2023) accelerate score-based diffusion model sampling at the cost of sample quality but lack a natural way to trade-off quality for speed. To address this limitation, we propose Consistency Trajectory Model (CTM), a generalization encomp...Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech Generators
Wiebke Hutiri*, Orestis Papakyriakopoulos*, Alice Xiang
The rapid and wide-scale adoption of AI to generate human speech poses a range of significant ethical and safety risks to society that need to be addressed. For example, a growing number of speech generation incidents are associated with swatting attacks in the United States...Hearing Anything Anywhere
Mason Long Wang*, Samuel Clarke, Ruohan Gao, Shangzhe Wu, Jiajun Wu
Multimodal representation learning to integrate different modalities, such as text, vision, and audio is important for real-world applications. The symmetric InfoNCE loss proposed in CLIP is a key concept in multimodal representation learning. In this work, we provide a theo...Asynchronous Task Plan Refinement for Multi-Robot Task and Motion Planning
Yoonchang Sung*, Rahul Shome*, Peter Stone
This paper explores general multi-robot task and motion planning, where multiple robots in close proximity manipulate objects while satisfying constraints and a given goal. In particular, we formulate the plan refinement problem--which, given a task plan, finds valid assignm...MyStyle++: A Controllable Personalized Generative Prior
Libing Zeng*, Lele Chen, Yi Xu*, Nima Kalantari*
In this paper, we propose an approach to obtain a personalized generative prior with explicit control over a set of attributes. We build upon MyStyle, a recently introduced method, that tunes the weights of a pre-trained StyleGAN face generator on a few images of an individu...Ethical Considerations for Responsible Data Curation
Jerone Andrews, Dora Zhao*, William Thong, Apostolos Modas, Orestis Papakyriakopoulos*, Alice Xiang
Human-centric computer vision (HCCV) data curation practices often neglect privacy and bias concerns, leading to dataset retractions and unfair models. HCCV datasets constructed through nonconsensual web scraping lack crucial metadata for comprehensive fairness and robustnes...Enhancing Semantic Communication with Deep Generative Models -- An ICASSP Special Session Overview
Eleonora Grassucci*, Yuki Mitsufuji, Ping Zhang*, Danilo Comminiello*
Semantic communication is poised to play a pivotal role in shaping the landscape of future AI-driven communication systems. Its challenge of extracting semantic information from the original complex content and regenerating semantically consistent data at the receiver, possi...BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network
Takashi Shibuya, Yuhta Takida, Yuki Mitsufuji
Generative adversarial network (GAN)-based vocoders have been intensively studied because they can synthesize high-fidelity audio waveforms faster than real-time. However, it has been reported that most GANs fail to obtain the optimal projection for discriminating between re...VRDMG: Vocal Restoration via Diffusion Posterior Sampling with Multiple Guidance
Carlos Hernandez-Olivan*, Koichi Saito, Naoki Murata, Chieh-Hsin Lai, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Yuki Mitsufuji
Restoring degraded music signals is essential to enhance audio quality for downstream music manipulation. Recent diffusion-based music restoration methods have demonstrated impressive performance, and among them, diffusion posterior sampling (DPS) stands out given its intrin...Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription
Frank Cwitkowitz*, Kin Wai Cheuk, Woosung Choi, Marco A. Martínez-Ramírez, Keisuke Toyama*, Wei-Hsiang Liao, Yuki Mitsufuji
In recent years, research on music transcription has focused mainly on architecture design and instrument-specific data acquisition. With the lack of availability of diverse datasets, progress is often limited to solo-instrument tasks such as piano transcription. Several wor...Enhancing Diffusion Models with 3D Perspective Geometry Constraints
Rishi Upadhyay*, Howard Zhang*, Yunhao Ba, Ethan Yang*, Blake Gella*, Sicheng Jiang*, Alex Wong*, Achuta Kadambi*
While perspective is a well-studied topic in art, it is generally taken for granted in images. However, for the recent wave of high-quality image synthesis methods such as latent diffusion models, perspective accuracy is not an explicit requirement. Since these methods are c...Posthoc privacy guarantees for collaborative inference with modified Propose-Test-Release
Abhishek Singh*, Praneeth Vepakomma*, Vivek Sharma, Ramesh Raskar*
Cloud-based machine learning inference is an emerging paradigm where users query by sending their data through a service provider who runs an ML model on that data and returns back the answer. Due to increased concerns over data privacy, recent works have proposed Collaborat...VaryNote: A Method to Automatically Vary the Number of Notes in Symbolic Music
Juan M. Huerta*, Bo Liu*, Peter Stone
Automatically varying the number of notes in symbolic music has various applications in assisting music creators to embellish simple tunes or to reduce complex music to its core idea. In this paper, we formulate the problem of varying the number of notes while preserving the...STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
Kazuki Shimada, Archontis Politis*, Parthasaarathy Sudarsanam*, Daniel Krause*, Kengo Uchida, Sharath Adavann*, Aapo Hakala*, Yuichiro Koyama*, Naoya Takahashi, Shusuke Takahashi*, Tuomas Virtanen*, Yuki Mitsufuji
While direction of arrival (DOA) of sound events is generally estimated from multichannel audio data recorded in a microphone array, sound events usually derive from visually perceptible source objects, e.g., sounds of footsteps come from the feet of a walker. This paper pro...CERM: Context-aware Literature-based Discovery via Sentiment Analysis
Julio Christian Young*, Uchenna Akujuobi
Motivated by the abundance of biomedical publications and the need to better understand the relationship between food and health, we study a new sentiment analysis task based on literature- based discovery. Many attempts have been made to introduce health into recipe recomme...Privacy Assessment on Reconstructed Images: Are Existing Evaluation Metrics Faithful to Human Perception?
Xiaoxiao Sun*, Nidham Gazagnadou, Vivek Sharma, Lingjuan Lyu, Hongdong Li*, Liang Zheng*
Hand-crafted image quality metrics, such as PSNR and SSIM, are commonly used to evaluate model privacy risk under reconstruction attacks. Under these metrics, reconstructed images that are determined to resemble the original one generally indicate more privacy leakage. Image...LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning
Bo Liu*, Yifeng Zhu*, Chongkai Gao*, Yihao Feng*, Qiang Liu*, Yuke Zhu*, Peter Stone
Lifelong learning offers a promising paradigm of building a generalist agent that learns and adapts over its lifespan. Unlike traditional lifelong learning problems in image and text domains, which primarily involve the transfer of declarative knowledge of entities and conce...FAMO: Fast Adaptive Multitask Optimization
Bo Liu*, Yihao Feng*, Peter Stone, Qiang Liu*
One of the grand enduring goals of AI is to create generalist agents that can learn multiple different tasks from diverse data via multitask learning (MTL). However, gradient descent (GD) on the average loss across all tasks may yield poor multitask performance due to severe...Elden: Exploration via Local Dependencies
Zizhao Wang*, Jiaheng Hu*, Roberto Martin-Martin*, Peter Stone
Tasks with large state space and sparse reward present a longstanding challenge to reinforcement learning. In these tasks, an agent needs to explore the state space efficiently until it finds reward: the hard exploration problem. To deal with this problem, the community has ...f-Policy Gradients: A General Framework for Goal-Conditioned RL using f-Divergences
Siddhant Agarwal*, Ishan Durugkar, Amy Zhang*, Peter Stone
Goal-Conditioned RL problems provide sparse rewards where the agent receives a reward signal only when it has achieved the goal, making exploration a difficult problem. Several works augment this sparse reward with a learned dense reward function, but this can lead to subopt...Differentially Private Image Classification by Learning Priors from Random Processes
Xinyu Tang*, Ashwinee Panda*, Vikash Sehwag, Prateek Mittal*
In privacy-preserving machine learning, differentially private stochastic gradient descent (DP-SGD) performs worse than SGD due to per-sample gradient clipping and noise addition.A recent focus in private learning research is improving the performance of DP-SGD on private da...UltraRE: Enhancing RecEraser for Recommendation Unlearning via Error Decomposition
Yuyuan Li*, Chaochao Chen*, Yizhao Zhang*, Weiming Liu*, Lingjuan Lyu, Xiaolin Zheng*, Dan Meng*, Jun Wang*
With growing concerns regarding privacy in machine learning models, regulations have committed to granting individuals the right to be forgotten while mandating companies to develop non-discriminatory machine learning systems, thereby fueling the study of the machine unlearn...Towards Personalized Federated Learning via Heterogeneous Model Reassembly
Jiaqi Wang*, Xingyi Yang*, Suhan Cui*, Liwei Che*, Lingjuan Lyu, Dongkuan Xu*, Fenglong Ma*
This paper focuses on addressing the practical yet challenging problem of model heterogeneity in federated learning, where clients possess models with different network structures. To track this problem, we propose a novel framework called pFedHR, which leverages heterogeneo...Is Heterogeneity Notorious? Taming Heterogeneity to Handle Test-Time Shift in Federated Learning
Yue Tan, Chen Chen, Weiming Zhuang, Xin Dong, Lingjuan Lyu, Guodong Long*
Federated learning (FL) is an effective machine learning paradigm where multiple clients can train models based on heterogeneous data in a decentralized manner without accessing their private data. However, existing FL systems undergo performance deterioration due to feature...Where Did I Come From? Origin Attribution of AI-Generated Images
Zhenting Wang, Chen Chen, Yi Zeng, Lingjuan Lyu, Shiqing Ma*
Image generation techniques have been gaining increasing attention recently, but concerns have been raised about the potential misuse and intellectual property (IP) infringement associated with image generation models. It is, therefore, necessary to analyze the origin of ima...Towards a fuller understanding of neurons with Clustered Compositional Explanations
Biagio La Rosa*, Leilani H. Gilpin*, Roberto Capobianco
Compositional Explanations is a method for identifying logical formulas of concepts that approximate the neurons' behavior. However, these explanations are linked to the small spectrum of neuron activations used to check the alignment (i.e., the highest ones), thus lacking c...Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors
Swami Sankaranarayanan, Thomas Hartvigsen*, Hamid Palangi*, Yoon Kim*, Marzyeh Ghassemi*
Deployed models decay over time due to shifting inputs, changing user needs, or emergent knowledge gaps. When harmful behaviors are identified, targeted edits are required. However, current model editors, which adjust specific behaviors of pre-trained models, degrade model p...Automatic Piano Transcription with Hierarchical Frequency-Time Transformer
Keisuke Toyama*, Taketo Akama*, Yukara Ikemiya, Yuhta Takida, Wei-Hsiang Liao, Yuki Mitsufuji
Taking long-term spectral and temporal dependencies into account is essential for automatic piano transcription. This is especially helpful when determining the precise onset and offset for each note in the polyphonic piano content. In this case, we may rely on the capabilit...Memory Replay For Continual Learning With Spiking Neural Networks
Michela Proietti*, Alessio Ragno*, Roberto Capobianco
Two of the most impressive features of biological neural networks are their high energy efficiency and their ability to continuously adapt to varying inputs. On the contrary, the amount of power required to train top-performing deep learning models rises as they become more ...Explainable AI in drug discovery: self-interpretable graph neural network for molecular property prediction using concept whi…
Michela Proietti*, Alessio Ragno*, Biagio La Rosa*, Rino Ragno*, Roberto Capobianco
Molecular property prediction is a fundamental task in the field of drug discovery. Several works use graph neural networks to leverage molecular graph representations. Although they have been successfully applied in a variety of applications, their decision process is not t...Understanding Deep RL agent decisions: a novel interpretable approach with trainable prototypes
Caterina Borzillo*, Alessio Ragno*, Roberto Capobianco
Deep reinforcement learning (DRL) models have shown great promise in various applications, but their practical adoption in critical domains is limited due to their opaque decision-making processes. To address this challenge, explainable AI (XAI) techniques aim to enhance tra...FRUNI and FTREE synthetic knowledge graphs for evaluating explainability
Pablo Sanchez Martin, Tarek Besold, Priyadarshini Kumari
Research on knowledge graph completion (KGC)---i.e., link prediction within incomplete KGs---is witnessing significant growth in popularity. Recently, KGC using KG embedding (KGE) models, primarily based on complex architectures (e.g., transformers), have achieved remarkable...Extending Audio Masked Autoencoders Toward Audio Restoration
Zhi Zhong*, Hao Shi*, Masato Hirano*, Kazuki Shimada, Kazuya Tateishi*, Takashi Shibuya, Shusuke Takahashi*, Yuki Mitsufuji
Audio classification and restoration are among major downstream tasks in audio signal processing. However, restoration derives less of a benefit from pretrained models compared to the overwhelming success of pretrained models in classification tasks. Due to such unbalanced b...Query by Activity Video in the Wild
Tao Hu*, William Thong, Pascal Mettes*, Cees Snoek*
This paper considers retrieval of videos containing human activity from just a video query. In the literature, a common assumption is that all activities have sufficient labelled examples when learning an embedding for retrieval. However, this assumption does not hold in pra...MAS: Towards Resource-Efficient Federated Multiple-Task Learning
Weiming Zhuang, Yonggang Wen*, Shuai Zhang*, Lingjuan Lyu
Federated learning (FL) is an emerging distributed machine learning method that empowers in-situ model training on decentralized edge devices. However, multiple simultaneous FL tasks could overload resource-constrained devices. In this work, we propose the first FL system to...The Perils of Learning From Unlabeled Data: Backdoor Attacks on Semi-supervised Learning
Virat Shejwalkar, Lingjuan Lyu, Amir Houmansadr*
Semi-supervised machine learning (SSL) is gaining popularity as it reduces the cost of training ML models. It does so by using very small amounts of (expensive, well-inspected) labeled data and large amounts of (cheap, non-inspected) unlabeled data. SSL has shown comparable ...TARGET: Federated Class-Continual Learning via Exemplar-Free Distillation
Jie Zhang*, Chen Chen, Weiming Zhuang, Lingjuan Lyu
This paper focuses on an under-explored yet important problem: Federated Class-Continual Learning (FCCL), where new classes are dynamically added in federated learning. Existing FCCL works suffer from various limitations, such as requiring additional datasets or storing the ...Spatio-Temporal Convolution-Attention Video Network
Ali Diba*, Vivek Sharma, Mohammad.M Arzani*, Luc Van Gool*
In this paper, we present a hierarchical neural network based on convolutional and attention modeling for short and long-range video reasoning, called Spatio-Temporal Convolution-Attention Video Network (STCA). The proposed method is capable of learning appearance and tempor...A Novel Control Law for Multi-joint Human-Robot Interaction Tasks While Maintaining Postural Coordination
Keya Ghonasgi*, Reuth Mirsky*, Adrian M Haith*, Peter Stone, Ashish D Deshpande*
Exoskeleton robots are capable of safe torque-controlled interactions with a wearer while moving their limbs through pre-defined trajectories. However, affecting and assisting the wearer's movements while incorporating their inputs (effort and movements) effectively during a...Symbolic State Space Optimization for Long Horizon Mobile Manipulation Planning
Xiaohan Zhang*, Yifeng Zhu*, Yan Ding*, Yuqian Jiang*, Yuke Zhu*, Peter Stone, Shiqi Zhang*
In existing task and motion planning (TAMP) research, it is a common assumption that experts manually specify the state space for task-level planning. A well-developed state space enables the desirable distribution of limited computational resources between task planning and...NeuRBF: A Neural Fields Representation with Adaptive Radial Basis Functions
Zhang Chen*, Zhong Li*, Liangchen Song*, Lele Chen, Jingyi Yu*, Yi Xu*
We present a novel type of neural fields that uses general radial bases for signal representation. State-of-the-art neural fields typically rely on grid-based representations for storing local neural features and N-dimensional linear kernels for interpolating features at con...Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting
Wentao Bao*, Lele Chen, Libing Zeng*, Zhong Li*, Yi Xu*, Junsong Yuan*, Yu Kong*
Hand trajectory forecasting from egocentric views is crucial for enabling a prompt understanding of human intentions when interacting with AR/VR systems. However, existing methods handle this problem in a 2D image space which is inadequate for 3D real-world applications. In ...Beyond Skin Tone: A Multidimensional Measure of Apparent Skin Color
William Thong, Przemyslaw Joniak*, Alice Xiang
This paper strives to measure apparent skin color in computer vision, beyond a unidimensional scale on skin tone. In their seminal paper Gender Shades, Buolamwini and Gebru have shown how gender classification systems can be biased against women with darker skin tones. While...Event Tables for Efficient Experience Replay
Varun Kompella, Thomas Walsh, Samuel Barrett, Peter R. Wurman, Peter Stone
Experience replay (ER) is a crucial component of many deep reinforcement learning (RL) systems. However, uniform sampling from an ER buffer can lead to slow convergence and unstable asymptotic behaviors. This paper introduces Stratified Sampling from Event Tables (SSET), whi...Iteratively Improving Speech Recognition and Voice Conversion
Mayank Kumar Singh*, Naoya Takahashi, Onoe Naoyuki*
Many existing works on voice conversion (VC) tasks use automatic speech recognition (ASR) models for ensuring linguistic consistency between source and converted samples. However, for the low-data resource domains, training a high-quality ASR remains to be a challenging task...BRExIt: On Opponent Modelling in Expert Iteration
Daniel Hernandez, Hendrik Baier*, Michael Kaisers*
Finding a best response policy is a central objective in game theory and multi-agent learning, with modern population-based training approaches employing reinforcement learning algorithms as best response oracles to improve play against candidate opponents (typically previou...A Pathway Towards Responsible AI Generated Content
Lingjuan Lyu
AI Generated Content (AIGC) has received tremendous attention within the past few years, with content ranging from image, text, to audio, video, etc. Meanwhile, AIGC has become a double-edged sword and recently received much criticism regarding its responsible usage. In this...RAIN: RegulArization on Input and Network for Black-Box Domain Adaptation
Qucheng Peng*, Zhengming Ding*, Lingjuan Lyu, Lichao Sun*, Chen Chen
Source-Free domain adaptation transits the source-trained model towards target domain without exposing the source data, trying to dispel these concerns about data privacy and security. However, this paradigm is still at risk of data leakage due to adversarial attacks on the ...FedSampling: A Better Sampling Strategy for Federated Learning
Tao Qi*, Fangzhao Wu*, Lingjuan Lyu, Yongfeng Huang*, Xing Xie*
Federated learning (FL) is an important technique for learning models from decentralized data in a privacy-preserving way. Existing FL methods usually uniformly sample clients for local model learning in each round. However, different clients may have significantly different...Reducing Communication for Split Learning by Randomized Top-k Sparsification
Fei Zheng*, Chaochao Chen*, Lingjuan Lyu, Binhui Yao*
The EU AI Act proposal addresses, among other applications, AI systems that enable facial classification and emotion recognition. As part of previous work, we have investigated how citizens deliberate about the validity of AI-based facial classifications in the advertisement...Flickr Africa: Examining Geo-Diversity in Large-Scale, Human-Centric Visual Data
Keziah Naggita*, Julienne LaChance, Alice Xiang
Biases in large-scale image datasets are known to influence the performance of computer vision models as a function of geographic context. To investigate the limitations of standard Internet data collection methods in low- and middle-income countries, we analyze human-centri...Meta-Sift: How to Sift Out a Clean Subset in the Presence of Data Poisoning?
Yi Zeng, Minzhou Pan*, Himanshu Jahagirdar*, Lingjuan Lyu, Ruoxi Jia*
External data sources are increasingly being used to train machine learning (ML) models as the data demand increases. However, the integration of external data into training poses data poisoning risks, where malicious providers manipulate their data to compromise the utility...PrivateRec: Differentially Private Model Training and Online Serving for Federated News Recommendation.
Ruixuan Liu*, Yanlin Wang*, Yang Cao*, Lingjuan Lyu, Weike Pan*, Yun Chen*, Hong Chen*
Collecting and training over sensitive personal data raise severe privacy concerns in personalized recommendation systems, and federated learning can potentially alleviate the problem by training models over decentralized user data.However, a theoretically private solution i...Composing Efficient, Robust Tests for Policy Selection
Dustin Morrill, Thomas Walsh, Daniel Hernandez, Peter R. Wurman, Peter Stone
Modern reinforcement learning systems produce many high-quality policies throughout the learning process. However, to choose which policy to actually deploy in the real world, they must be tested under an intractable number of environmental conditions. We introduce RPOSST, a...Model-Based Meta Automatic Curriculum Learning.
Zifan Xu*, Yulin Zhang*, Shahaf S. Shperberg*, Reuth Mirsky*, Yuqian Jiang*, Bo Liu*, Peter Stone
Curriculum learning (CL) has been widely explored to facilitate the learning of hard-exploration tasks in reinforcement learning (RL) by training a sequence of easier tasks, often called a curriculum. While most curricula are built either manually or automatically based on h...Grounding LTLf Specifcations in Image Sequences
Elena Umili*, Roberto Capobianco, Giuseppe De Giacomo*
A critical challenge in neuro-symbolic (NeSy) approaches is to handle the symbol grounding problem without direct supervision. That is mapping high-dimensional raw data into an interpretation over a finite set of abstract concepts with a known meaning, without using labels. ...The Whole Is Greater than the Sum of Its Parts: Improving DNN-based Music Source Separation
Naoya Takahashi, Stefan Uhlich*, Shusuke Takahashi*, Yuki Mitsufuji
This paper presents the crossing scheme (X-scheme) for improving the performance of deep neural network (DNN)-based music source separation (MSS) without increasing calculation cost. It consists of three components: (i) multi-domain loss (MDL), (ii) bridging operation, which...Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement
Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Takashi Shibuya, Shusuke Takahashi*, Yuki Mitsufuji
Although deep neural network (DNN)-based speech enhancement (SE) methods outperform the previous non-DNN-based ones, they often degrade the perceptual quality of generated outputs. To tackle this problem, we introduce a DNN-based generative refiner, Diffiner, aiming to impro...Learning to Synthesize Photorealistic Dual-pixel Images from RGBD frames
Feiran Li, Heng Guo*, Hiroaki Santo*, Fumio Okura*, Yasuyuki Matsushita*
Recent advances in data-driven dual-pixel (DP) research are bottlenecked by the difficulties in reaching large-scale DP datasets, and a photorealistic image synthesis approach appears to be a credible solution. To benchmark the accuracy of various existing DP image simulator...Byzantine-Robust Learning on Heterogeneous Data via Gradient Splitting
Yuchen Liu*, Chen Chen, Lingjuan Lyu, Fangzhao Wu*, Sai Wu*, Gang Chen*
Federated learning has exhibited vulnerabilities to Byzantine attacks, where the Byzantine attackers can send arbitrary gradients to the central server to destroy the convergence and performance of the global model. A wealth of defenses have been proposed to defend against B...Revisiting Data-Free Knowledge Distillation with Poisoned Teachers
Junyuan Hong, Yi Zeng, Shuyang Yu*, Lingjuan Lyu, Ruoxi Jia*, Jiayu Zhou*
Data-free knowledge distillation (KD) helps realistically transfer knowledge from a pre-trained model (known as the teacher model) to a smaller model (known as the student model) without access to the original training data used for training the teacher model. However, the s...GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration
Naoki Murata, Koichi Saito, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Yuki Mitsufuji
Pre-trained diffusion models have been successfully used as priors in a variety of linear inverse problems, where the goal is to reconstruct a signal from noisy linear measurements. However, existing approaches require knowledge of the linear operator. In this paper, we prop...Men Also Do Laundry: Multi-Attribute Bias Amplification
Dora Zhao*, Jerone T. A. Andrews, Alice Xiang
As computer vision systems become more widely deployed, there is increasing concern from both the research community and the public that these systems are not only reproducing but amplifying harmful social biases. The phenomenon of bias amplification, which is the focus of t...Dimension-independent Certified Neural Network Watermarks via Mollifier Smoothing
Jiaxiang Ren*, Yang Zhou*, Jiayin Jin*, Lingjuan Lyu, Da Yan*
Certified_Watermarks is the first to provide a watermark certificate against 𝑙2-norm watermark removal attacks, by leveraging the randomized smoothing techniques for certified robustness to adversarial attacks. However, the randomized smoothing techniques suffer from hardnes...Fast Federated Machine Unlearning with Nonlinear Functional Theory
Tianshi Che*, Yang Zhou*, Zijie Zhang*, Lingjuan Lyu, Ji Liu*, Da Yan*, Dejing Dou*, Jun Huan*
Federated machine unlearning (FMU) aims to remove the influence of a specified subset of training data upon request from a trained federated learning model. Despite achieving remarkable performance, existing FMU techniques suffer from inefficiency due to two sequential opera...Reconstructive Neuron Pruning for Backdoor Defense
Yige Li*, Xixiang Lyu*, Xingjun Ma*, Nodens Koren*, Lingjuan Lyu, Bo Li*, Yu-Gang Jiang*
Deep neural networks (DNNs) have been found to be vulnerable to backdoor attacks, raising security concerns about their deployment in mission-critical applications. While existing defense methods have demonstrated promising results, it is still not clear how to effectively r...Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark.
Wenjun Peng*, Jingwei Yi*, Fangzhao Wu*, Shangxi Wu*, Bin Bin Zhu*, Lingjuan Lyu, Binxing Jiao*, Guangzhong Sun*, Xing Xie*
Large language models (LLMs) have demonstrated powerful capabilities in both text understanding and generation. Companies have begun to offer Embedding as a Service (EaaS) based on these LLMs, which can benefit various natural language processing (NLP) tasks for customers. H...Augmented data sheets for speech datasets and ethical decision-making
Orestis Papakyriakopoulos*, Anna Seo Gyeong Choi*, William Thong, Dora Zhao*, Jerone Andrews, Rebecca Bourke, Alice Xiang, Allison Koenecke*
Human-centric image datasets are critical to the development of computer vision technologies. However, recent investigations have foregrounded significant ethical issues related to privacy and bias, which have resulted in the complete retraction, or modification, of several ...Model-based Reinforcement Learning with Scalable Composite Policy Gradient Estimators
Paavo Parmas*, Takuma Seno, Yuma Aoki*
In model-based reinforcement learning (MBRL), policy gradients can be estimated either by derivative-free RL methods, such as likelihood ratio gradients (LR), or by backpropagating through a differentiable model via reparameterization gradients (RP). Instead of using one or ...What's Wrong with Gradient-based Complex Query Answering?
Ouns El Harzli, Samy Badreddine, Tarek Besold
Multi-hop query answering on knowledge graphs is known to be a challenging computational task. Neurosymbolic approaches using neural link predictors have shown promising results but are still outperformed by combinatorial optimization methods on several benchmarks, including...Improving Artificial Intelligence with Games
Peter R. Wurman, Peter Stone, Michael Spranger
Games continue to drive progress in the development of artificial intelligence.Causal Policy Gradient for Whole-Body Mobile Manipulation
Jiaheng Hu*, Peter Stone, Roberto Martin-Martin*
Developing the next generation of household robot helpers requires combining locomotion and interaction capabilities, which is generally referred to as mobile manipulation (MoMa). MoMa tasks are difficult due to the large action space of the robot and the common multi-objec..."What's That Robot Doing Here?": Factors Influencing Perceptions Of Incidental Encounters With Autonomous Quadruped Robots.
Elliott Hauser*, Yao-Cheng Chan*, Geethika Hemkumar*, Daksh Dua*, Parth Chonkar*, Efren Mendoza Enriquez*, Tiffany Kao*, Shikhar Gupta*, Huihai Wang*, Justin Hart*, Reuth Mirsky*, Joydeep Biswas*, Junfeng Jiao*, Peter Stone
Autonomous service robots in a public setting will generate hundreds of incidental human-robot encounters, yet researchers have only recently addressed this important topic in earnest. In this study, we hypothesized that visual indicators of human control, such as a leash on...Task Phasing: Automated Curriculum Learning from Demonstrations
Vaibhav Bajaj*, Guni Sharon*, Peter Stone
Applying reinforcement learning (RL) to sparse reward domains is notoriously challenging due to insufficient guiding signals. Common RL techniques for addressing such domains include (1) learning from demonstrations and (2) curriculum learning. While these two approaches ha...Visual Reward Machines
Elena Umili*, Francesco Argenziano*, Aymeric Barbin*, Roberto Capobianco
Non-markovian Reinforcement Learning (RL) tasks are extremely hard to solve, because intelligent agents must consider the entire history of state-action pairs to act rationally in the environment. Most works use Linear Temporal Logic (LTL) to specify temporally-extended task...FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation
Chieh-Hsin Lai, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon*
Score-based generative models learn a family of noise-conditional score functions corresponding to the data density perturbed with increasingly large amounts of noise. These perturbed data densities are tied together by the Fokker-Planck equation (FPE), a partial differentia...A Reflection on How Cross-Cultural Perspectives on the Ethics of Facial Analysis AI Can Inform EU Policymaking
Chiara Ullstein*, Severin Engelmann*, Orestis Papakyriakopoulos*, Jens Grossklags*
The EU AI Act proposal addresses, among other applications, AI systems that enable facial classification and emotion recognition. As part of previous work, we have investigated how citizens deliberate about the validity of AI-based facial classifications in the advertisement...Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects
Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Stefan Uhlich*, Kyogu Lee*, Yuki Mitsufuji
We propose an end-to-end music mixing style transfer system that converts the mixing style of an input multitrack to that of a reference song. This is achieved with an encoder pre-trained with a contrastive objective to extract only audio effects related information from a r...Nonparallel Emotional Voice Conversion for unseen speaker-emotion pairs using dual domain adversarial network Virtual Domain …
Nirmesh Shah*, Mayank Kumar Singh*, Naoya Takahashi, Naoyuki Onoe*
Primary goal of an emotional voice conversion (EVC) system is to convert the emotion of a given speech signal from one style to another style without modifying the linguistic content of the signal. Most of the state-of-the-art approaches convert emotions for seen speaker-emo...DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability
Kin Wai Cheuk, Toshimitsu Uesaka, Naoki Murata, Naoya Takahashi, Shusuke Takahashi*, Dorien Herremans*, Yuki Mitsufuji
In this paper we propose a novel generative approach, DiffRoll, to tackle automatic music transcription (AMT).Instead of treating AMT as a discriminative task in which the model is trained to convert spectrograms into piano rolls, we think of it as a conditional generative t...Hierarchical Diffusion Models for Singing Voice Neural Vocoder
Naoya Takahashi, Mayank Kumar Singh*, Yuki Mitsufuji
Recent progress in deep generative models has improved the quality of neural vocoders in speech domain. However, generating a high-quality singing voice remains challenging due to a wider variety of musical expressions in pitch, loudness, and pronunciations. In this work, we...Towards Adversarially Robust Continual Learning
Tao Bai, Chen Chen, Lingjuan Lyu, Jun Zhao*, Bihan Wen*
Recent studies show that models trained by continual learning can achieve the comparable performances as the standard supervised learning and the learning flexibility of continual learning models enables their wide applications in the real world. Deep learning models, howeve...Unsupervised vocal dereverberation with diffusion-based generative models
Koichi Saito, Naoki Murata, Toshimitsu Uesaka, Chieh-Hsin Lai, Yuhta Takida, Takao Fukui*, Yuki Mitsufuji
Removing reverb from reverberant music is a necessary technique to clean up audio for downstream music manipulations. Reverberation of music contains two categories, natural reverb, and artificial reverb. Artificial reverb has a wider diversity than natural reverb due to its...Kinematic coordinations capture learning during human-exoskeleton interaction
Keya Ghonasgi*, Reuth Mirsky*, Nisha Bhargava*, Adrian M Haith*, Peter Stone, Ashish D Deshpande*
Human–exoskeleton interactions have the potential to bring about changes in human behavior for physical rehabilitation or skill augmentation. Despite signifcant advances in the design and control of these robots, their application to human training remains limited. The key o...MocoSFL: enabling cross-client collaborative self-supervised learning
Jingtao Li, Lingjuan Lyu, Daisuke Iso, Chaitali Chakrabarti*, Michael Spranger
Existing collaborative self-supervised learning (SSL) schemes are not suitable for cross-client applications because of their expensive computation and large local data requirements. To address these issues, we propose MocoSFL, a collaborative SSL framework based on Split Fe...IDEAL: Query-Efficient Data-Free Learning from Black-Box Models
Jie Zhang*, Chen Chen, Lingjuan Lyu
Knowledge Distillation (KD) is a typical method for training a lightweight student model with the help of a well-trained teacher model. However, most KD methods require access to either the teacher's training data or model parameter, which is unrealistic. To tackle this prob...D-Shape: Demonstration-Shaped Reinforcement Learning via Goal Conditioning.
Caroline Wang*, Garrett Warnell*, Peter Stone
While combining imitation learning (IL) and reinforcement learning (RL) is a promising way to address poor sample efficiency in autonomous behavior acquisition, methods that do so typically assume that the requisite behavior demonstrations are provided by an expert that be...MACTA: A Multi-agent Reinforcement Learning Approach for Cache Timing Attacks and Detection
Jiaxun Cui*, Xiaomeng Yang*, Mulong Luo*, Geunbae Lee*, Peter Stone, Hsien-Hsin S. Lee*, Benjamin Lee*, G. Edward Suh*, Wenjie Xiong*, Yuandong Tian*
Security vulnerabilities in computer systems raise serious concerns as computers process an unprecedented amount of private and sensitive data today. Cachetiming attacks (CTA) pose an important practical threat as they can effectively breach many protection mechanisms in t...CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
Hao-Wen Dong*, Naoya Takahashi, Yuki Mitsufuji, Julian McAuley*, Taylor Berg-Kirkpatrick*
Recent years have seen progress beyond domain-specific sound separation for speech or music towards universal sound separation for arbitrary sounds. Prior work on universal sound separation has investigated separating a target sound out of an audio mixture given a text query...Twofer: Tackling Continual Domain Shift with Simultaneous Domain Generalization and Adaptation
Chenxi Liu*, Lixu Wang, Lingjuan Lyu, Chen Sun*, Xiao Wang*, Qi Zhu*
In real-world applications, deep learning models often run in non-stationary environments where the target data distribution continually shifts over time. There have been numerous domain adaptation (DA) methods in both online and offline modes to improve cross-domain adaptat...MECTA: Memory-Economic Continual Test-Time Model Adaptation
Junyuan Hong, Lingjuan Lyu, Jiayu Zhou*, Michael Spranger
Continual Test-time Adaptation (CTA) is a promising art to secure accuracy gains in continually-changing environments. The state-of-the-art adaptations improve out-of-distribution model accuracy via computation-efficient online test-time gradient descents but meanwhile cost ...Learning Perceptual Hallucination for Multi-Robot Navigation in Narrow Hallways
Jin-Soo Park*, Xuesu Xiao*, Garrett Warnell*, Harel Yedidsion*, Peter Stone
While current systems for autonomous robot navigation can produce safe and efficient motion plans in static environments, they usually generate suboptimal behaviors when multiple robots must navigate together in confined spaces. For example, when two robots meet each other i...Benchmarking Reinforcement Learning Techniques for Autonomous Navigation
Zifan Xu*, Bo Liu*, Xuesu Xiao*, Anirudh Nair*, Peter Stone
Deep reinforcement learning (RL) has broughtmany successes for autonomous robot navigation. However,there still exists important limitations that prevent real-worlduse of RL-based navigation systems. For example, most learningapproaches lack safety guarantees; and learned na...A View From Somewhere: Human-Centric Face Representations
Jerone T. A. Andrews, Przemyslaw Joniak*, Alice Xiang
Few datasets contain self-identified sensitive attributes, inferring attributes risks introducing additional biases, and collecting attributes can carry legal risks. Besides, categorical labels can fail to reflect the continuous nature of human phenotypic diversity, making i...Towards Robustness Certification Against Universal Perturbations
Yi Zeng, Zhouxing Shi*, Ming Jin*, Feiyang Kang*, Lingjuan Lyu, Cho-Jui Hsieh*, Ruoxi Jia*
In this paper, we investigate the problem of certifying neural network robustness against universal perturbations (UPs), which have been widely used in universal adversarial attacks and backdoor attacks. Existing robustness certification methods aim to provide robustness gua...Minimum Topology Attacks for Graph Neural Networks
Mengmei Zhang*, Xiao Wang*, Chuan Shi*, Lingjuan Lyu, Tianchi Yang*, Junping Du*
With the great popularity of Graph Neural Networks (GNNs), their robustness to adversarial topology attacks has received increasing attention. Although many attack methods have been proposed, they mainly focus on fixed-budget attacks, aiming at finding the most adversarial p...An Overview of Environmental Features that Impact Deep Reinforcement Learning in Sparse-Reward Domains
Jim Martin Catacora Ocaña*, Roberto Capobianco, Daniele Nardi*
Deep reinforcement learning has achieved impressive results in recent years; yet, it is still severely troubled by environments showcasing sparse rewards. On top of that, not all sparse-reward environments are created equal, ie, they can differ in the presence or absence of ...T50: T-PAIR: Temporal Node-pair Embedding for Automatic Biomedical Hypothesis Generation (Extended abstract)
Uchenna Akujuobi, Michael Spranger, Sucheendra Palaniappan*, Xiangliang Zhang*
In this paper, we study an automatic hypothesis generation (HG) problem, which refers to the discovery of meaningful implicit connections between scientific terms, including but not limited to diseases, chemicals, drugs, and genes extracted from databases of biomedical publi...A self-interpretable module for deep image classification on small data
Jim Martin Catacora Ocaña*, Roberto Capobianco, Daniele Nardi*
Deep neural networks are the driving force of the recent explosion of machine learning applications in everyday life. However, they usually require a lot of training data to work well, and they act as black-boxes, making predictions without any explanation about them. This p...Upvotes? Downvotes? No Votes? Understanding the relationship between reaction mechanisms and political discourse on Reddit
Orestis Papakyriakopoulos*, Severin Engelmann*, Amy Winecoff*
A significant share of political discourse occurs online on social media platforms. Policymakers and researchers try to understand the role of social media design in shaping the quality of political discourse around the globe. In the past decades, scholarship on political di...Being 'Seen' vs. 'Mis-Seen': Tensions between Privacy and Fairness in Computer Vision
Alice Xiang
The rise of facial recognition and related computer vision technologies has been met with growing anxiety over the potential for artificial intelligence (“AI”) to create mass surveillance systems and further entrench societal biases. These concerns have led to calls for grea...Machine Learning Security in Industry: A Quantitative Survey
L. Bieringer*, K. Grosse*, Tarek Besold, B. Biggio*, K. Krombholz*
Despite the large body of academic work on machine learning security, little is known about the occurrence of attacks on machine learning systems in the wild. In this paper, we report on a quantitative study with 139 industrial practitioners. We analyze attack occurrence and...A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems
Megan M. Baker*, Alexander New*, Mario Aguilar-Simon*, Ziad Al-Halah*, Sébastien M. R. Arnold*, Ese Ben-Iwhiwhu*, Andrew P. Brna*, Ethan Brooks*, Ryan C. Brown*, Zachary Daniels*, Anurag Daram*, Fabien Delattre*, Ryan Dellana*, Eric Eaton*, Haotian Fu*, Kristen Grauman*, Jesse Hostetler*, Shariq Iqbal*, Cassandra Kent*, Nicholas Ketz*, Soheil Kolouri*, George Konidaris*, Dhireesha Kudithipudi*, Seungwon Lee*, Michael L. Littman*, Sandeep Madireddy*, Jorge A. Mendez*, Eric Q. Nguyen*, Christine D. Piatko*, Praveen K. Pilly*, Aswin Raghavan*, Abrar Rahman*, Santhosh Kumar Ramakrishnan*, Neale Ratzlaff*, Andrea Soltoggio*, Peter Stone, Indranil Sur*, Zhipeng Tang*, Saket Tiwari*, Kyle Vedder*, Felix Wang*, Zifan Xu*, Angel Yanguas-Gil*, Harel Yedidsion*, Shangqun Yu*, Gautam K. Vallabha*
Despite the advancement of machine learning techniques in recent years, state-of-the-art systems lack robustness to “real world” events, where the input distributions and tasks encountered by the deployed systems will not be limited to the original training context, and syst...Reward (Mis)design for autonomous driving
W. Bradley Knox*, Alessandro Allievi*, Holger Banzhaf*, Felix Schmitt*, Peter Stone
This article considers the problem of diagnosing certain common errors in reward design. Its insights are also applicable to the design of cost functions and performance metrics more generally. To diagnose common errors, we develop 8 simple sanity checks for identifying flaw...The Perils of Trial-and-Error Reward Design: Misdesign through Overfitting and Invalid Task Specifications
Serena Booth*, W. Bradley Knox*, Julie Shah*, Scott Niekum*, Peter Stone, Alessandro Allievi*
In reinforcement learning (RL), a reward function that aligns exactly with a task's true performance metric is often sparse. For example, a true task metric might encode a reward of 1 upon success and 0 otherwise. These sparse task metrics can be hard to learn from, so in pr...Metric Residual Networks for Sample Efficient Goal-Conditioned Reinforcement Learning
Bo Liu*, Yihao Feng*, Qiang Liu*, Peter Stone
Goal-conditioned reinforcement learning (GCRL) has a wide range of potential real-world applications, including manipulation and navigation problems in robotics. Especially in such robotics tasks, sample efficiency is of the utmost importance for GCRL since, by default, the ...Defending Against Backdoor Attacks in Natural Language Generation
Xiaofei Sun*, Xiaoya Li*, Yuxian Meng*, Xiang Ao*, Lingjuan Lyu, Jiwei Li*, Tianwei Zhang*
The frustratingly fragile nature of neural network models make current natural language generation (NLG) systems prone to backdoor attacks and generate malicious sequences that could be sexist or offensive. Unfortunately, little effort has been invested to how backdoor attac...Delving into the Adversarial Robustness of Federated Learning
Zijie Zhang*, Bo Li*, Chen Chen, Lingjuan Lyu, Shuang Wu*, Shouhong Ding*, Chao Wu*
In Federated Learning (FL), models are as fragile as centrally trained models against adversarial examples. However, the adversarial robustness of federated learning remains largely unexplored. This paper casts light on the challenge of adversarial robustness of federated le...Considerations for Ethical Speech Recognition Datasets
Orestis Papakyriakopoulos*, Alice Xiang
Speech AI Technologies are largely trained on publicly available datasets or by the massive web-crawling of speech. In both cases, data acquisition focuses on minimizing collection effort, without necessarily taking the data subjects’ protection or user needs into considerat...Multimodal Embodied Attribute Learning by Robots for Object-Centric Action Policies.
Xiaohan Zhang*, Saeid Amiri*, Jivko Sinapov*, Jesse Thomason*, Peter Stone, Shiqi Zhang*
Robots frequently need to perceive object attributes, such as red, heavy, and empty, using multimodal exploratory behaviors, such as look, lift, and shake. One possible way for robots to do so is to learn a classifier for each perceivable attribute given an exploratory behav...State of the art of visual analytics for explainable deep learning
Biagio La Rosa*, Graziano Blasilli*, R Bourqui*, D Auber*, Giuseppe Santucci*, Roberto Capobianco, Enrico Bertini*, Romain Giot*, Marco Angelini*
The use and creation of machine-learning-based solutions to solve problems or reduce their computational costs are becoming increasingly widespread in many domains. Deep Learning plays a large part in this growth. However, it has drawbacks such as a lack of explainability an...DM2: Distributed Multi-Agent Reinforcement Learning via Distribution Matching
Caroline Wang*, Ishan Durugkar, Elad Liebman*, Peter Stone
Current approaches to multi-agent cooperation rely heavily on centralized mechanisms or explicit communication protocols to ensure convergence. This paper studies the problem of distributed multi-agent learning without resorting to centralized components or explicit communic...BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach
Bo Liu*, Mao Ye*, Stephen Wright*, Peter Stone, Qiang Liu*
Bilevel optimization (BO) is useful for solving a variety of important machine learning problems including but not limited to hyperparameter optimization, meta-learning, continual learning, and reinforcement learning. Conventional BO methods need to differentiate through the...Causality for Temporal Unfairness Evaluation and Mitigation
Aida Rahmattalabi, Alice Xiang
Recent interests in causality for fair decision-making systems has been accompanied with great skepticism due to practical and epistemological challenges with applying existing causal fairness approaches. Existing works mainly seek to remove the causal effect of social categ...A View From Somewhere: Human-Centric Face Representations
Jerone T. A. Andrews, Przemyslaw Joniak*, Alice Xiang
We propose to implicitly learn a set of continuous face-varying dimensions, without ever asking an annotator to explicitly categorize a person. We uncover the dimensions by learning on a novel dataset of 638,180 human judgments of face similarity (FAX). We demonstrate the ut...A View From Somewhere: Human-Centric Face Representations
Jerone T. A. Andrews, Przemyslaw Joniak*, Alice Xiang
Biases in human-centric computer vision models are often attributed to a lack of sufficient data diversity, with many demographics insufficiently represented. However, auditing datasets for diversity can be difficult, due to an absence of ground-truth labels of relevant feat...Proppo: a Message Passing Framework for Customizable and Composable Learning Algorithms
Paavo Parmas*, Takuma Seno
While existing automatic differentiation (AD) frameworks allow flexibly composing model architectures, they do not provide the same flexibility for composing learning algorithms---everything has to be implemented in terms of back propagation. To address this gap, we invent A...Value Function Decomposition for Iterative Design of Reinforcement Learning Agents
James MacGlashan, Evan Archer, Alisa Devlic, Takuma Seno, Craig Sherstan, Peter R. Wurman, Peter Stone
Designing reinforcement learning (RL) agents is typically a difficult process that requires numerous design iterations. Learning can fail for a multitude of reasons and standard RL methods provide too few tools to provide insight into the exact cause. In this paper, we show ...Outsourcing Training without Uploading Data via Efficient Collaborative Open-Source Sampling
Junyuan Hong, Lingjuan Lyu, Jiayu Zhou*, Michael Spranger
As deep learning blooms with growing demand for computation and data resources, outsourcing model training to a powerful cloud server becomes an attractive alternative to training at a low-power and cost-effective end device. Traditional outsourcing requires uploading device...Calibrated Federated Adversarial Training with Label Skewness
Chen Chen, Yuchen Liu*, Xingjun Ma*, Lingjuan Lyu
Recent studies have shown that, like traditional machine learning, federated learning (FL) is also vulnerable to adversarial attacks.To improve the adversarial robustness of FL, few federated adversarial training (FAT) methods have been proposed to apply adversarial training...DENSE: Data-Free One-Shot Federated Learning
Jie Zhang*, Chen Chen, Bo Li*, Lingjuan Lyu, Shuang Wu*, Shouhong Ding*, Chunhua Shen*, Chao Wu*
One-shot Federated Learning (FL) has recently emerged as a promising approach, which allows the central server to learn a model in a single communication round. Despite the low communication cost, existing one-shot FL methods are mostly impractical or face inherent limitatio...CATER: Intellectual Property Protection on Text Generation APIs via Conditional Watermarks
Xuanli He*, Qiongkai Xu*, Yi Zeng, Lingjuan Lyu, Fangzhao Wu*, Jiwei Li*, Ruoxi Jia*
Previous works have validated that text generation APIs can be stolen through imitation attacks, causing IP violations. In order to protect the IP of text generation APIs, a recent work has introduced a watermarking algorithm and utilized the null-hypothesis test as a post-h...Prompt Certified Machine Unlearning with Randomized Gradient Smoothing and Quantization
Zijie Zhang*, Xin Zhao*, Tianshi Che*, Yang Zhou*, Lingjuan Lyu
The right to be forgotten calls for efficient machine unlearning techniques that make trained machine learning models forget a cohort of data. The combination of training and unlearning operations in traditional machine unlearning methods often leads to the expensive computa...FairVFL: A Fair Vertical Federated Learning Framework with Contrastive Adversarial Learning
Tao Qi*, Fangzhao Wu*, Chuhan Wu*, Lingjuan Lyu, Tong Xu*, Hao Liao*, Zhongliang Yang*, Yongfeng Huang*, Xing Xie*
Vertical federated learning (VFL) is a privacy-preserving machine learning paradigm that can learn models from features distributed on different platforms in a privacy-preserving way. Since in real-world applications the data may contain bias on fairness-sensitive features (...d3rlpy: An Offline Deep Reinforcement Learning Library
Takuma Seno, Michita Imai*
In this paper, we introduce d3rlpy, an open-sourced offline deep reinforcement learning (RL) library for Python. d3rlpy supports a set of offline deep RL algorithms as well as off-policy online algorithms via a fully documented plug-and-play API. To address a reproducibility...RecipeMind: Guiding Ingredient Choices from Food Pairing to Recipe Completion using Cascaded Set Transformer
We propose a computational approach for recipe ideation, a downstream task that helps users select and gather ingredients for creating dishes. To perform this task, we developed RecipeMind, a food affinity score prediction model that quantifies the suitability of adding an i...
Content-Diverse Comparisons improve IQA
William Thong, Jose Costa Pereira*, Sarah Parisot*, Ales Leonardis*, Steven McDonagh*
Image quality assessment (IQA) forms a natural and often straightforward undertaking for humans, yet effective automation of the task remains highly challenging. Recent metrics from the deep learning community commonly compare image pairs during training to improve upon trad...Prototype-based Interpretable Graph Neural Networks
Alessio Ragno*, Biagio La Rosa*, Roberto Capobianco
Graph neural networks have proved to be a key tool for dealing with many problems and domains such as chemistry, natural language processing and social networks. While the structure of the layers is simple, it is difficult to identify the patterns learned by the graph neural...FedSkip: Combatting Statistical Heterogeneity with Federated Skip Aggregation
Ziqing Fan*, Yanfeng Wang*, Jiangchao Yao*, Lingjuan Lyu, Ya Zhang*, Qi Tian*
The statistical heterogeneity of the non-independent and identically distributed (non-IID) data in local clients significantly limits the performance of federated learning. Previous attempts like FedProx, SCAFFOLD, MOON, FedNova and FedDyn resort to an optimization perspecti...Privacy and Robustness in Federated Learning: Attacks and Defenses
Lingjuan Lyu, Han Yu*, Xingjun Ma*, Chen Chen, Lichao Sun*, Jun Zhao*, Qiang Yang*, Philip S. Yu*
As data are increasingly being stored in different silos and societies becoming more aware of data privacy issues, the traditional centralized training of artificial intelligence (AI) models are facing efficiency and privacy challenges. Recently, federated learning (FL) has ...Human-Centric Visual Diversity Auditing
Jerone T. A. Andrews, Przemyslaw Joniak*, Alice Xiang
Biases in human-centric computer vision models are often attributed to a lack of sufficient data diversity, with many demographics insufficiently represented. However, auditing datasets for diversity can be difficult, due to an absence of ground-truth labels of relevant feat...Attrition of Workers with Minoritized Identities on AI Teams
Jeffrey Brown*, Tina Park*, Jiyoo Chang*, McKane Andrus*, Alice Xiang, Christine Custis*
The effects of AI systems are far-reaching and affect diverse commu- nities all over the world. The demographics of AI teams, however, do not reflect this diversity. Instead, these teams, particularly at big tech companies, are dominated by Western, White, and male work- ers...Beyond Model Extraction: Imitation Attack for Black-Box NLP APIs
Qiongkai Xu*, Xuanli He*, Lingjuan Lyu, Lizhen Qu*, Gholamreza Haffari*
Machine-learning-as-a-service (MLaaS) has attracted millions of users to their splendid large-scale models. Although published as black-box APIs, the valuable models behind these services are still vulnerable to imitation attacks. Recently, a series of works have demonstrate...Cross-Network Social User Embedding with Hybrid Differential Privacy Guarantees
Jiaqian Ren*, Lei Jiang*, Hao Peng*, Lingjuan Lyu, Zhiwei Liu*, Chaochao Chen*, Jia Wu*, Xu Bai*, Philip S. Yu*
Integrating multiple online social networks (OSNs) has important implications for many downstream social mining tasks, such as user preference modelling, recommendation, and link prediction. However, it is unfortunately accompanied by growing privacy concerns about leaking s...Quantifying Changes in Kinematic Behavior of a Human-Exoskeleton Interactive System
Keya Ghonasgi*, Reuth Mirsky*, Adrian M Haith*, Peter Stone, Ashish D Deshpande*
While human-robot interaction studies are becoming more common, quantification of the effects of repeated interaction with an exoskeleton remains unexplored. We draw upon existing literature in human skill assessment and present extrinsic and intrinsic performance metrics t...AI-Competent Individuals and Laypeople Tend to Oppose Facial Analysis AI
Chiara Ullstein*, Severin Engelmann*, Orestis Papakyriakopoulos*, Michel Hohendanner*, Jens Grossklags*
Recent advances in computer vision analysis have led to a debate about the kinds of conclusions artificial intelligence (AI) should make about people based on their faces. Some scholars have argued for supposedly ``common sense'' facial inferences that can be reliably drawn ...Fine-mixing: Mitigating Backdoors in Fine-tuned Language Models
Zhiyuan Zhang*, Lingjuan Lyu, Xingjun Ma*, Chenguang Wang*, Xu Sun*
Deep Neural Networks (DNNs) are known to be vulnerable to backdoor attacks. In Natural Language Processing (NLP), DNNs are often backdoored during the fine-tuning process of a large-scale Pre-trained Language Model (PLM) with poisoned samples. Although the clean weights of P...Extracted BERT Model Leaks More Information than You Think!
Xuanli He*, Chen Chen, Lingjuan Lyu, Qiongkai Xu*
The collection and availability of big data, combined with advances in pre-trained models (e.g. BERT), have revolutionized the predictive performance of natural language processing tasks. This allows corporations to provide machine learning as a service (MLaaS) by encapsulat...Grounding LTLf specifications in images
Elena Umili*, Roberto Capobianco, Giuseppe De Giacomo*
A critical challenge in neurosymbolic approaches is to handle the symbol grounding problem without direct supervision. That is mapping high-dimensional raw data into an interpretation over a finite set of abstract concepts with a known meaning, without using labels. In this ...SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization
Yuhta Takida, Takashi Shibuya, Wei-Hsiang Liao, Chieh-Hsin Lai, Junki Ohmura*, Toshimitsu Uesaka, Naoki Murata, Shusuke Takahashi*, Toshiyuki Kumakura*, Yuki Mitsufuji
One noted issue of vector-quantized variational autoencoder (VQ-VAE) is that the learned discrete representation uses only a fraction of the full capacity of the codebook, also known as codebook collapse. We hypothesize that the training scheme of VQ-VAE, which involves some...Ligand-based and structure-based studies to develop predictive models for SARS-CoV-2 main protease inhibitors through the 3d-…
Eleonora Proia*, Alessio Ragno*, Lorenzo Antonini*, Manuela Sabatino*, Milan Mladenovič*, Roberto Capobianco, Rino Ragno*
The main protease (Mpro) of SARS-Cov-2 is the essential enzyme for maturation of functional proteins implicated in viral replication and transcription. The peculiarity of its specific cleavage site joint with its high degree of conservation among all coronaviruses promote it...Deep Reinforcement Learning for Pin-Point Autonomous Lunar Landing: Trajectory Recalculation for Obstacle Avoidance
Giulia Ciabatti*, Dario Spiller, Shreyansh Daftry*, Roberto Capobianco, Fabio Curti*
This work aims to present a method to perform autonomous precision landing—pin-point landing—on a planetary environment and perform trajectory recalculation for fault recovery where necessary. In order to achieve this, we choose to implement a Deep Reinforcement Learning—DRL...Interpretable Relational Representations for Food Ingredient Recommendation Systems
Kana Maruyama, Michael Spranger
Supporting chefs with ingredient recommender systems to create new recipes is challenging, as good ingredient combinations depend on many factors like taste, smell, cuisine style, texture, chef’s preference and many more. Useful machine learning models do need to be accurate...Privacy for Free: How does Dataset Condensation Help Privacy?
Tian Dong, Bo Zhao*, Lingjuan Lyu
To prevent unintentional data leakage, research community has resorted to data generators that can produce differentially private data for model training. However, for the sake of the data privacy, existing solutions suffer from either expensive training cost or poor general...Accelerated Federated Learning with Decoupled Adaptive Optimization
Jiayin Jin*, Jiaxiang Ren*, Yang Zhou*, Lingjuan Lyu, Ji Liu*, Dejing Dou*
The federated learning (FL) framework enables edge clients to collaboratively learn a shared inference model while keeping privacy of training data on clients. Recently, many heuristics efforts have been made to generalize centralized adaptive optimization methods, such as S...A Federated Graph Neural Network Framework for Privacy-Preserving Personalization
Yongfeng Huang*, Chuhan Wu*, Fangzhao Wu*, Lingjuan Lyu, Tao Qi*, Xing Xie*
Graph neural network (GNN) is effective in modeling high-order interactions and has been widely used in various personalized applications such as recommendation. However, mainstream personalization methods rely on centralized GNN learning on global graphs, which have conside...Heterogeneous Graph Node Classification with Multi-Hops Relation Features
Xiaolong Xu*, Lingjuan Lyu, Hong Jin*, Weiqiang Wang*, Shuo Jia*
In recent years, knowledge graph~(KG) has obtained many achievements in both research and industrial fields. However, most KG algorithms consider node embedding with only structure and node features, but not relation features. In this paper, we propose a novel Heterogeneous ...Distributed Graph Learning with Smooth Priors
Isabela Cunha Maia Nobre*, Mireille El Gheche, Pascal Frossard*
Graph learning is often a necessary step in processing or representing structured data, when the underlying graph is not given explicitly. Graph learning is generally performed centrally with a full knowledge of the graph signals, namely the data that lives on the graph node...How to Inject Backdoors with Better Consistency: Logit Anchoring on Clean Data
Zhiyuan Zhang*, Lingjuan Lyu, Weiqiang Wang*, Lichao Sun*, Xu Sun*
Since training a large-scale backdoored model from scratch requires a large training dataset, several recent attacks have considered to inject backdoors into a trained clean model without altering model behaviors on the clean data. Previous work finds that backdoors can be i...Vertically Federated Graph Neural Network for Privacy-Preserving Node Classification
Chaochao Chen*, Longfei Zheng*, Huiwen Wu*, Lingjuan Lyu, Jun Zhou*, Jia Wu*, Bingzhe Wu*, Ziqi Liu*, Li Wang*, Xiaolin Zheng*
Graph Neural Network (GNN) has achieved remarkable progresses in various real-world tasks on graph data. High-performance GNN models always depend on both rich features and complete edge information in graph. However, such information could possibly be isolated by different ...Dynamic Sparse Training for Deep Reinforcement Learning
Ghada Sokar, Elena Mocanu, Decebal Constantin Mocanu, Mykola Pechenizkiy, Peter Stone
Deep reinforcement learning (DRL) agents are trained through trial-and-error interactions with the environment. This leads to a long training time for dense neural networks to achieve good performance. Hence, prohibitive computation and memory resources are consumed. Recentl...Data- Free Adversarial Knowledge Distillation for Graph Neural Networks
Yuanxin Zhuang*, Lingjuan Lyu, Chuan Shi*, Carl Yang*, Lichao Sun*
Graph neural networks (GNNs) have been widely used in modeling graph structured data, owing to its impressive performance in a wide range of practical applications. Recently, knowledge distillation (KD) for GNNs has enabled remarkable progress in graph model compression and ...Decision Boundary-aware Data Augmentation for Adversarial Training
Chen Chen, Jingfeng Zhang*, Xilie Xu*, Lingjuan Lyu, Chaochao Chen*, Tianlei Hu*, Gang Chen*
Adversarial training (AT) is a typical method to learn adversarially robust deep neural networks via training on the adversarial variants generated by their natural examples. However, as training progresses, the training data becomes less attackable, which may undermine the ...Wasserstein-based Graph Alignment
Hermina Petric Maretic*, Mireille El Gheche, Giovanni Chierchia*, Pascal Frossard*
We propose a novel method for comparing non-aligned graphs of different sizes, based on the Wasserstein distance between graph signal distributions induced by the respective graph Laplacian matrices. Specifically, we cast a new formulation for the one-to-many graph alignment...Communication-Efficient Federated Learning via Knowledge Distillation
Yongfeng Huang*, Chuhan Wu*, Fangzhao Wu*, Lingjuan Lyu, Xing Xie*
Federated learning is a privacy-preserving machine learning technique to train intelligent models from decentralized data, which enables exploiting private data by communicating local model updates in each iteration of model learning rather than the raw data. However, model ...Practical Attribute Reconstruction Attack Against Federated Learning
Chen Chen, Lingjuan Lyu, Han Yu*, Gang Chen*
Existing federated learning (FL) designs have been shown to exhibit vulnerabilities which can be exploited by adversaries to compromise data privacy. However, most current works conduct attacks by leveraging gradients calculated on a small batch of data. This setting is not ...Enhancing Haptic Distinguishability of Surface Materials with Boosting Technique
Priyadarshini Kumari, Subhasis Chaudhuri*
Discriminative features are crucial for several learning applications, such as object detection and classification. Neural networks are extensively used for extracting discriminative features of images and speech signals. However, the lack of large datasets in the haptics d...Selective particle attention: Rapidly and flexibly selecting features for deep reinforcement learning
Sam Blakeman, Denis Mareschal*
Deep Reinforcement Learning (RL) is often criticized for being data inefficient and inflexible to changes in task structure. Part of the reason for these issues is that Deep RL typically learns end-to-end using backpropagation, which results in task-specific representations....Traffic Anomaly Prediction Based on Joint Static-Dynamic Spatio-Temporal Evolutionary Learning
Xiaoming Liu*, Zhanwei Zhang*, Lingjuan Lyu, Zhaohan Zhang*, Shuai Xiao*, Chao Shen*, Philip Yu*
Accurate traffic anomaly prediction offers an opportunity to save the wounded at the right location in time. However, the complex process of traffic anomaly is affected by both various static factors and dynamic interactions. The recent evolving representation learning provi...Outracing Champion Gran Turismo Drivers with Deep Reinforcement Learning
Pete Wurman, Samuel Barrett, Kenta Kawamoto, James MacGlashan, Kaushik Subramanian, Thomas Walsh, Roberto Capobianco, Alisa Devlic, Franziska Eckert, Florian Fuchs, Leilani Gilpin, Piyush Khandelwal, Varun Kompella, Hao Chih Lin, Patrick MacAlpine, Declan Oller, Takuma Seno, Craig Sherstan, Michael D. Thomure, Houmehr Aghabozorgi, Leon Barrett, Rory Douglas, Dion Whitehead Amago, Peter Dürr, Peter Stone, Michael Spranger, Hiroaki Kitano
Many potential applications of artificial intelligence involve making real-time decisions in physical systems while interacting with humans. Automobile racing represents an extreme example of these conditions; drivers must execute complex tactical manoeuvres to pass or block...Differential Private Knowledge Transfer for Privacy-Preserving Cross-Domain Recommendation
Chaochao Chen*, Huiwen Wu*, Jiajie Su*, Lingjuan Lyu, Xiaolin Zheng*, Li Wang*
Cross Domain Recommendation (CDR) has been popularly studied to alleviate the cold-start and data sparsity problem commonly existed in recommender systems. CDR models can improve the recommendation performance of a target domain by leveraging the data of other source domains...GEAR: A Margin-based Federated Adversarial Training Approach
Chen Chen, Jie Zhang*, Lingjuan Lyu
Previous studies have shown that federated learning (FL) is vulnerable to well-crafted adversarial examples. Some recent efforts tried to combine adversarial training with FL, i.e., federated adversarial training (FAT), in order to achieve adversarial robustness in FL. Howev...Byzantine-resilient Federated Learning via Gradient Memorization
Chen Chen, Lingjuan Lyu, Yuchen Liu*, Fangzhao Wu*, Chaochao Chen*, Gang Chen*
Federated learning (FL) provides a privacy-aware learning framework by enabling a multitude of participants to jointly construct models without collecting their private training data. However, federated learning has exhibited vulnerabilities to Byzantine attacks. Many existi...FedBERT: When Federated Learning Meets Pre-Training
Yuanyishu Tian*, Yao Wan*, Lingjuan Lyu, Dezhong Yao*, Hai Jin*, Lichao Sun*
The fast growth of pre-trained models (PTMs) has brought natural language processing to a new era, which becomes a dominant technique for various natural language processing (NLP) applications. Every user can download weights of PTMs, then fine-tune the weights on a task on ...FedCTR: Federated Native Ad CTR Prediction with Cross Platform User Behavior Data
Chuhan Wu*, Fangzhao Wu*, Lingjuan Lyu, Yongfeng Huang*, Xing Xie*
Native ad is a popular type of online advertisement which has similar forms with the native content displayed on websites. Native ad CTR prediction is useful for improving user experience and platform revenue. However, it is challenging due to the lack of explicit user inten...DADFNet: Dual Attention and Dual Frequency-Guided Dehazing Network for Video-Empowered Intelligent Transportation
Yu Guo*, Wen Liu*, Jiangtian Nie*, Lingjuan Lyu, Zehui Xiong*, Jiawen Kang*, Han Yu*, Dusit Niyato*
Visual surveillance technology is an indispensable functional component of advanced traffic management systems. It has been applied to perform traffic supervision tasks, such as object detection, tracking and recognition. However, adverse weather conditions, e.g., fog, haze ...Protecting Intellectual Property of Language Generation APIs with Lexical Watermark
Xuanli He*, Qiongkai Xu*, Lingjuan Lyu, Fangzhao Wu*, Chenguang Wang*
Nowadays, due to the breakthrough in natural language generation (NLG), including machine translation, document summarization, image captioning, etc NLG models have been encapsulated in cloud APIs to serve over half a billion people worldwide and process over one hundred bil...Planetary Environment Prediction Using Generative Modeling
Shrijit Singh*, Shreyansh Daftry*, Roberto Capobianco
Planetary rovers have a limited sensory horizon and operate in environments where limited information about the surrounding terrain is available. The rough and unknown nature of the terrain in planetary environments potentially leads to scenarios where the rover gets stuckan...fGOT: Graph Distances based on Filters and Optimal Transport
Hermina Petric Maretic*, Mireille El Gheche, Giovanni Chierchia*, Pascal Frossard*
Graph comparison deals with identifying similarities and dissimilarities between graphs. A major obstacle is the unknown alignment of graphs, as well as the lack of accurate and inexpensive comparison metrics. In this work we introduce the filter graph distance. It is an opt...Logic Tensor Networks
Samy Badreddine, Artur d'Avila Garcez*, Luciano Serafini*, Michael Spranger
Attempts at combining logic and neural networks into neurosymbolic approaches have been on the increase in recent years. In a neurosymbolic system, symbolic knowledge assists deep learning, which typically uses a sub-symbolic distributed representation, to learn and reason a...Exploiting Data Sparsity in Secure Cross-Platform Social Recommendation
Jamie Cui*, Chaochao Chen*, Lingjuan Lyu, Carl Yang*, Li Wang*
Social recommendation has shown promising improvements over traditional systems since it leverages social correlation data as an additional input. Most existing works assume that all data are available to the recommendation platform. However, in practice, user-item interacti...Anti-Backdoor Learning: Training Clean Models on Poisoned Data
Yige Li*, Xixiang Lyu*, Nodens Koren*, Lingjuan Lyu, Bo Li*, Xingjun Ma*
Backdoor attack has emerged as a major security threat to deep neural networks(DNNs). While existing defense methods have demonstrated promising results on detecting and erasing backdoor triggers, it is still not clear if measures can be taken to avoid the triggers from bein...Gradient Driven Rewards to Guarantee Fairness in Collaborative Machine Learning
Xu Xinyi*, Lingjuan Lyu, Xingjun Ma*, Chenglin Miao*, Chuan-Sheng Foo*, Bryan Kian Hsiang Low*
Collaborative machine learning provides a promising framework for different agents to pool their resources (e.g., data) for a common learning task. In realistic settings where agents are self-interested and not altruistic, they may be unwilling to share data or model without...Expert Human-Level Driving in Gran Turismo Sport Using Deep Reinforcement Learning with Image-based Representation
Ryuji Imamura, Takuma Seno, Kenta Kawamoto, Michael Spranger
When humans play virtual racing games, they use visual environmental information on the game screen to understand the rules within the environments. In contrast, a state-of-the-art realistic racing game AI agent that outperforms human players does not use image-based environ...d3rlpy: An Offline Deep Reinforcement Learning Library
Takuma Seno, Michita Imai*
In this paper, we introduce d3rlpy, an open-sourced offline deep reinforcement learning (RL) library for Python. d3rlpy supports a number of offline deep RL algorithms as well as online algorithms via a user-friendly API. To assist deep RL research and development projects, ...Tafl-ES: Exploring Evolution Strategies for Asymmetrical Board Games
Roberto Gallotta*, Roberto Capobianco
NeuroEvolution Strategies (NES) are a subclass of Evolution Strategies (ES). While their application to games and board games have been studied in the past [11], current state of the art in most of the games is still held by classic RL models, such as AlphaGo Zero [16]. This...Exploration-Intensive Distractors: Two Environment Proposals and a Benchmarking
Jim Martin Catacora Ocaña*, Roberto Capobianco, Daniele Nardi*
Sparse-reward environments are famously challenging for deep reinforcement learning (DRL) algorithms. Yet, the prospect of solving intrinsically sparse tasks in an end-to-end fashion without any extra reward engineering is highly appealing. Such aspiration has recently led t...Detection Accuracy for Evaluating Compositional Explanations of Units
Sayo M. Makinwa*, Biagio La Rosa*, Roberto Capobianco
The recent success of deep learning models in solving complex problems and in different domains has increased interest in understanding what they learn. Therefore, different approaches have been employed to explain these models, one of which uses human-understandable concept...A Discussion about Explainable Inference on Sequential Data via Memory-Tracking
Biagio La Rosa*, Roberto Capobianco, Daniele Nardi*
The recent explosion of deep learning techniques boosted the application of Artificial Intelligence in a variety of domains, thanks to their high performance. However, performance comes at the cost of interpretability: deep models contain hundred of nested non-linear operati...Data Poisoning Attacks on Federated Machine Learning
Gan Sun*, Yang Cong*, Jiahua Dong*, Qiang Wang*, Lingjuan Lyu, Ji Liu*
Federated machine learning which enables resource-constrained node devices (e.g., Internet of Things (IoT) devices, smartphones) to establish a knowledge-shared model while keeping the raw data local, could provide privacy preservation and economic benefit by designing an ef...Joint Stance and Rumor Detection in Hierarchical Heterogeneous Graph
Chen li*, Hao Peng*, Jianxin Li*, Lichao Sun*, Lingjuan Lyu, Lihong Wang*, Philip Yu*, Lifang He*
Recently, large volumes of false or unverified information (e.g., fake news and rumors) appear frequently in emerging social media, which are often discussed on a large scale and widely disseminated, causing bad consequences. Many studies on rumor detection indicate that the...Feature and Label Embedding Spaces Matter in Addressing Image Classifier Bias
William Thong, Cees Snoek*
This paper strives to address image classifier bias, with a focus on both feature and label embedding spaces. Previous works have shown that spurious correlations from protected attributes, such as age, gender, or skin tone, can cause adverse decisions. To balance potential ...Learning Transferable Policies for Autonomous Planetary Landing via Deep Reinforcement Learning
Giulia Ciabatti*, Shreyansh Daftry*, Roberto Capobianco
In this work, we develop an application for autonomous landing, exploiting the properties of Deep Reinforcement Learning and Transfer Learning in order to tackle the problem of planetary landing on unknown or barely-known extra-terrestrial environments by learning good-perfo...RecipeBowl: A Cooking Recommender for Ingredients and Recipes using Set Transformer
Michael Spranger, Kana Maruyama
Countless possibilities of recipe combinations challenge us to determine which additional ingredient goes well with others. In this work, we propose RecipeBowl which is a cooking recommendation system that takes a set of ingredients and cooking tags as input and suggests pos...Extending Real Logic with Aggregate Functions
Samy Badreddine, Michael Spranger
Real Logic is a recently introduced first-order language where formulas have fuzzy truth values in the interval [0, 1] and semantics are defined concretely with real domains. The Logic Tensor Networks (LTN) framework has applied Real Logic to many important AI tasks through ...FLEAM: A Federated Learning Empowered Architecture to Mitigate DDoS in Industrial IoT
Jianhua Li*, Lingjuan Lyu, Ximeng Liu*, Xuyun Zhang*, Xixiang Lyu*
A Novel Attribute Reconstruction Attack in Federated Learning
Lingjuan Lyu, Chen Chen
Federated learning (FL) emerged as a promising learning paradigm to enable a multitude of partici- pants to construct a joint ML model without expos- ing their private training data. Existing FL designs have been shown to exhibit vulnerabilities which can be exploited by adv...Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog
Jesse Thomason*, Aishwarya Padmakumar*, Jivko Sinapov*, Nick Walker*, Yuqian Jiang*, Harel Yedidsion*, Justin Hart*, Peter Stone, Raymond J. Mooney*
In this work, we present methods for using human-robot dialog to improve language understanding for a mobile robot agent. The agent parses natural language to underlying semantic meanings and uses robotic sensors to create multi-modal models of perceptual concepts like red a...Agent-Based Markov Modeling for Improved COVID-19 Mitigation Policies
Roberto Capobianco, Varun Kompella, James Ault*, Guni Sharon*, Stacy Jong*, Spencer Fox*, Lauren Meyers*, Pete Wurman, Peter Stone
The year 2020 saw the covid-19 virus lead to one of the worst global pandemics in history. As a result, governments around the world have been faced with the challenge of protecting public health while keeping the economy running to the greatest extent possible. Epidemiologi...Reconciling Legal and Technical Approaches to Algorithmic Bias
Alice Xiang
In recent years, there has been a proliferation of papers in the algorithmic fairness literature proposing various technical definitions of algorithmic bias and methods to mitigate bias. Whether these algorithmic bias mitigation methods would be permissible from a legal pers...Autonomous Planetary Landing via Deep Reinforcement Learning and Transfer Learning
Giulia Ciabatti*, Shreyansh Daftry*, Roberto Capobianco
The aim of this work is to develop an application for autonomous landing. We exploit the properties of Deep Reinforcement Learning and Transfer Learning, in order to tackle the problem of planetary landing on unknown or barely-known extra-terrestrial environments by learning...Autonomous Overtaking in Gran Turismo Sport Using Curriculum Reinforcement Learning
Yunlong Song*, Hao Chih Lin, Elia Kaufmann*, Peter Dürr, Davide Scaramuzza*
Professional race-car drivers can execute extreme overtaking maneuvers. However, existing algorithms for autonomous overtaking either rely on simplified assumptions about the vehicle dynamics or try to solve expensive trajectory-optimization problems online. When the vehicle...Efficient Real-Time Inference in Temporal Convolution Networks
Piyush Khandelwal, James MacGlashan, Pete Wurman, Peter Stone
It has been recently demonstrated that Temporal Convolution Networks (TCNs) provide state-of-the-art results in many problem domains where the input data is a time-series. TCNs typically incorporate information from a long history of inputs (the receptive field) into a singl...On the Validity of Arrest as a Proxy for Offense: Race and the Likelihood of Arrest for Violent Crimes
Riccardo Fogliato*, Alice Xiang, Zachary Lipton*, Daniel Nagin*, Alexandra Chouldechova*
The risk of re-offense is considered in decision-making at many stages of the criminal justice system, from pre-trial, to sentencing, to parole. To aid decision makers in their assessments, institutions increasingly rely on algorithmic risk assessment instruments (RAIs). The...Uncertainty as a Form of Transparency: Measuring, Communicating, and Using Uncertainty
Umang Bhatt*, Javier Antorán*, Yunfeng Zhang*, Q. Vera Liao*, Prasanna Sattigeri*, Riccardo Fogliato*, Gabrielle Melançon*, Ranganath Krishnan*, Jason Stanley*, Omesh Tickoo*, Lama Nachman*, Rumi Chunara*, Madhu*
Algorithmic transparency entails exposing system properties to various stakeholders for purposes that include understanding, improving, and contesting predictions. Until now, most research into algorithmic transparency has predominantly focused on explainability. Explainabil...Multiagent Epidemiologic Inference through Realtime Contact Tracing
Guni Sharon*, James Ault*, Peter Stone, Varun Kompella, Roberto Capobianco
This paper addresses an epidemiologic inference problem where, given realtime observation of test results, presence of symptoms,and physical contacts, the most likely infected individuals need to be inferred. The inference problem is modeled as a hidden Markovmodel where inf...Super-Human Performance in Gran Turismo Sport Using Deep Reinforcement Learning
Florian Fuchs, Yunlong Song*, Elia Kaufmann*, Davide Scaramuzza*, Peter Dürr
Autonomous car racing is a major challenge in robotics. It raises fundamental problems for classical approaches such as planning minimum-time trajectories under uncertain dynamics and controlling the car at the limits of its handling. Besides, the requirement of minimizing t..."What We Can’t Measure, We Can’t Understand": Challenges to Demographic Data Procurement in the Pursuit of Fairness
McKane Andrus*, Elena Spitzer*, Jeffrey Brown*, Alice Xiang
As calls for fair and unbiased algorithmic systems increase, so too does the number of individuals working on algorithmic fairness in industry. However, these practitioners often do not have access to the demographic data they feel they need to detect bias in practice. Even ...Expected Value of Communication for Planning in Ad Hoc Teamwork
William Macke*, Reuth Mirsky*, Peter Stone
A desirable goal for autonomous agents is to be able to coordinate on the fly with previously unknown teammates. Known as "ad hoc teamwork", enabling such a capability has been receiving increasing attention in the research community. One of the central challenges in ad hoc ...Temporal-Logic-Based Reward Shaping for Continuing Reinforcement Learning Tasks
Yuqian Jiang*, Sudarshanan Bharadwaj*, Bo Wu*, Rishi Shah*, Ufuk Topcu*, Peter Stone
In continuing tasks, average-reward reinforcement learning may be a more appropriate problem formulation than the more common discounted reward formulation. As usual, learning an optimal policy in this setting typically requires a large amount of training experiences. Reward...Goal Blending for Responsive Shared Autonomy in a Navigating Vehicle
Yu-Sian Jiang*, Garrett Warnell*, Peter Stone
Human-robot shared autonomy techniques for vehicle navigation hold promise for reducing a human driver's workload, ensuring safety, and improving navigation efficiency. However, because typical techniques achieve these improvements by effectively removing human control at cr...A Penny for Your Thoughts: The Value of Communication in Ad Hoc Teamwork
Reuth Mirsky*, William Macke*, Andy Wang*, Harel Yedidsion*, Peter Stone
In ad hoc teamwork, multiple agents need to collaborate without having knowledge about their teammates or their plans a priori. A common assumption in this research area is that the agents cannot communicate. However, just as two random people may speak the same language, au...Balancing Individual Preferences and Shared Objectives in Multiagent Reinforcement Learning
Ishan Durugkar, Elad Liebman*, Peter Stone
In multiagent reinforcement learning scenarios, it is often the case that independent agents must jointly learn to perform a cooperative task. This paper focuses on such a scenario in which agents have individual preferences regarding how to accomplish the shared task. We co...Explainable Inference on Sequential Data via Memory-Tracking
Biagio La Rosa*, Roberto Capobianco, Daniele Nardi*
In this paper we present a novel mechanism to get explanations that allow to better understand network predictions when dealing with sequential data. Specifically, we adopt memory-based networks — Differential Neural Computers — to exploit their capability of storing data in...Assessing SATNet's Ability to Solve the Symbol Grounding Problem
Michael Spranger, Oscar Chang*, Lampros Flokas*, Hod Lipson*
SATNet is an award-winning MAXSAT solver that can be used to infer logical rules and integrated as a differentiable layer in a deep neural network. It had been shown to solve Sudoku puzzles visually from examples of puzzle digit images, and was heralded as an impressive achi...Temporal Positive-unlabeled Learning for Biomedical Hypothesis Generation via Risk Estimation
Uchenna Akujuobi, Jun Chen*, Mohamed Elhoseiny*, Michael Spranger, Xiangliang Zhang*
Understanding the relationships between biomedical terms like viruses, drugs, and symptoms is essential in the fight against diseases. Many attempts have been made to introduce the use of machine learning to the scientific process of hypothesis generation (HG), which refers ...Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks
Lemeng Wu*, Bo Liu*, Peter Stone, Qiang Liu*
We propose firefly neural architecture descent, a general framework for progressively and dynamically growing neural networks to jointly optimize the networks' parameters and architectures. Our method works in a steepest descent fashion, which iteratively finds the best netw...An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch
Siddharth Desai*, Ishan Durugkar, Haresh Karnan*, Garrett Warnell*, Josiah Hanna*, Peter Stone
We examine the problem of transferring a policy learned in a source environment to a target environment with different dynamics, particularly in the case where it is critical to reduce the amount of interaction with the target environment during learning. This problem is par...Reinforcement Learning for Optimization of COVID-19 Mitigation Policies
Varun Kompella, Roberto Capobianco, Stacy Jong*, Jonathan Browne*, Spencer Fox*, Lauren Meyers*, Pete Wurman, Peter Stone
The year 2020 has seen the COVID-19 virus lead to one of the worst global pandemics in history. As a result, governments around the world are faced with the challenge of protecting public health, while keeping the economy running to the greatest extent possible. Epidemiologi...T-PAIR: Temporal node-pair embedding for automatic biomedical hypothesis generation
Uchenna Akujuobi, Michael Spranger, Sucheendra K Palaniappan*, Xiangliang Zhang*
In this paper, we study an automatic hypothesis generation (HG) problem, which refers to the discovery of meaningfulimplicit connections between scientific terms, including but not limited to diseases, chemicals, drugs, and genes extracted fromdatabases of biomedical publica...There are no results.