Meta-Sift: How to Sift Out a Clean Subset in the Presence of Data Poisoning?

VIEW PUBLICATION

Yi Zeng

Minzhou Pan*

Himanshu Jahagirdar*

Lingjuan Lyu

Ruoxi Jia*

* External authors

USENIX Security '23

2023

Abstract

External data sources are increasingly being used to train machine learning (ML) models as the data demand increases. However, the integration of external data into training poses data poisoning risks, where malicious providers manipulate their data to compromise the utility or integrity of the model. Most data poisoning defenses assume access to a set of clean data (referred to as the base set), which could be obtained through trusted sources. But it also becomes common that entire data sources for an ML task are untrusted (e.g., Internet data). In this case, one needs to identify a subset within a contaminated dataset as the base set to support these defenses.

This paper starts by examining the performance of defenses when poisoned samples are mistakenly mixed into the base set. We analyze five representative defenses that use base sets and find that their performance deteriorates dramatically with less than 1% poisoned points in the base set. These findings suggest that sifting out a base set with \emph{high precision} is key to these defenses' performance. Motivated by these observations, we study how precise existing automated tools and human inspection are at identifying clean data in the presence of data poisoning. Unfortunately, neither effort achieves the precision needed that enables effective defenses. Worse yet, many of the outcomes of these methods are worse than random selection.

In addition to uncovering the challenge, we take a step further and propose a practical countermeasure, Meta-Sift. Our method is based on the insight that existing poisoning attacks shift data distributions, resulting in high prediction loss when training on the clean portion of a poisoned dataset and testing on the corrupted portion. Leveraging the insight, we formulate a bilevel optimization to identify clean data and further introduce a suite of techniques to improve the efficiency and precision of the identification. Our evaluation shows that Meta-Sift can sift a clean base set with 100% precision under a wide range of poisoning threats. The selected base set is large enough to give rise to successful defense when plugged into the existing defense techniques.

Related Publications

How to Evaluate and Mitigate IP Infringement in Visual Generative AI?

ICML, 2025
Zhenting Wang, Chen Chen, Vikash Sehwag, Minzhou Pan*, Lingjuan Lyu

The popularity of visual generative AI models like DALL-E 3, Stable Diffusion XL, Stable Video Diffusion, and Sora has been increasing. Through extensive evaluation, we discovered that the state-of-the-art visual generative models can generate content that bears a striking r…

Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models

CVPR, 2025
Jie Ren, Kangrui Chen, Yingqian Cui, Shenglai Zeng, Hui Liu, Yue Xing, Jiliang Tang, Lingjuan Lyu

Text-to-image (T2I) diffusion models have shown exceptional capabilities in generating images that closely correspond to textual prompts. However, the advancement of T2I diffusion models presents significant risks, as the models could be exploited for malicious purposes, suc…

CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI

CVPR, 2025
Siyuan Cheng, Lingjuan Lyu, Zhenting Wang, Xiangyu Zhang, Vikash Sehwag

With the rapid advancement of generative AI, it is now pos-sible to synthesize high-quality images in a few seconds.Despite the power of these technologies, they raise signif-icant concerns regarding misuse. Current efforts to dis-tinguish between real and AI-generated image…

SEE ALL

HOME
Publications
Meta-Sift: How to Sift Out a Clean Subset in the Presence of Data Poisoning?

JOIN US

Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.

LEARN MORE