Authors

* External authors

Venue

Date

Share

SilentCipher: Deep Audio Watermarking

Mayank Kumar Singh*

Naoya Takahashi

Yuki Mitsufuji

Wei-Hsiang Liao

* External authors

INTERSPEECH 2024

2024

Abstract

In the realm of audio watermarking, it is challenging to simultaneously encode imperceptible messages while enhancing the message capacity and robustness. Although recent advancements in deep learning-based methods bolster the message capacity and robustness over traditional methods, the encoded messages introduce audible artefacts that restricts their usage in professional settings. In this study, we introduce three key innovations. Firstly, our work is the first deep learning-based model to integrate psychoacoustic model based thresholding to achieve imperceptible watermarks. Secondly, we introduce psuedo-differentiable compression layers, enhancing the robustness of our watermarking algorithm. Lastly, we introduce a method to eliminate the need for perceptual losses, enabling us to achieve SOTA in both robustness as well as imperceptible watermarking. Our contributions lead us to SilentCipher, a model enabling users to encode messages within audio signals sampled at 44.1kHz.

Related Publications

PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher

NeurIPS, 2024
Dongjun Kim*, Chieh-Hsin Lai, Wei-Hsiang Liao, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon*

To accelerate sampling, diffusion models (DMs) are often distilled into generators that directly map noise to data in a single step. In this approach, the resolution of the generator is fundamentally limited by that of the teacher DM. To overcome this limitation, we propose …

GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping

NeurIPS, 2024
Junyoung Seo, Kazumi Fukuda, Takashi Shibuya, Takuya Narihira, Naoki Murata, Shoukang Hu, Chieh-Hsin Lai, Seungryong Kim*, Yuki Mitsufuji

Generating novel views from a single image remains a challenging task due to the complexity of 3D scenes and the limited diversity in the existing multi-view datasets to train a model on. Recent research combining large-scale text-to-image (T2I) models with monocular depth e…

The whole is greater than the sum of its parts: improving music source separation by bridging networks

EURASIP, 2024
Ryosuke Sawata, Naoya Takahashi, Stefan Uhlich*, Shusuke Takahashi*, Yuki Mitsufuji

This paper presents the crossing scheme (X-scheme) for improving the performance of deep neural network (DNN)-based music source separation (MSS) with almost no increasing calculation cost. It consists of three components: (i) multi-domain loss (MDL), (ii) bridging operation…

JOIN US

Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.