CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI

VIEW PUBLICATION

Siyuan Cheng

Lingjuan Lyu

Zhenting Wang

Xiangyu Zhang

Vikash Sehwag

CVPR-25

2025

Abstract

With the rapid advancement of generative AI, it is now pos-
sible to synthesize high-quality images in a few seconds.
Despite the power of these technologies, they raise signif-
icant concerns regarding misuse. Current efforts to dis-
tinguish between real and AI-generated images may lack
generalization, being effective for only certain types of gen-
erative models and susceptible to post-processing techniques
like JPEG compression. To overcome these limitations, we
propose a novel framework, CO-SPY, that first enhances
existing semantic features (e.g., the number of fingers in a
hand) and artifact features (e.g., pixel value differences),
and then adaptively integrates them to achieve more general
and robust synthetic image detection. Additionally, we create
CO-SPYBENCH, a comprehensive dataset comprising 5 real
image datasets and 22 state-of-the-art generative models,
including the latest models like FLUX. We also collect 50k
synthetic images in the wild from the Internet to enable eval-
uation in a more practical setting. Our extensive evaluations
demonstrate that our detector outperforms existing methods
under identical training conditions, achieving an average
accuracy improvement of approximately 11% to 34%.

Related Publications

How to Evaluate and Mitigate IP Infringement in Visual Generative AI?

ICML, 2025
Zhenting Wang, Chen Chen, Vikash Sehwag, Minzhou Pan*, Lingjuan Lyu

The popularity of visual generative AI models like DALL-E 3, Stable Diffusion XL, Stable Video Diffusion, and Sora has been increasing. Through extensive evaluation, we discovered that the state-of-the-art visual generative models can generate content that bears a striking r…

Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models

CVPR, 2025
Jie Ren, Kangrui Chen, Yingqian Cui, Shenglai Zeng, Hui Liu, Yue Xing, Jiliang Tang, Lingjuan Lyu

Text-to-image (T2I) diffusion models have shown exceptional capabilities in generating images that closely correspond to textual prompts. However, the advancement of T2I diffusion models presents significant risks, as the models could be exploited for malicious purposes, suc…

Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget

CVPR, 2025
Vikash Sehwag, Xianghao Kong, Jingtao Li, Michael Spranger, Lingjuan Lyu

As scaling laws in generative AI push performance, they simultaneously concentrate the development of these models among actors with large computational resources. With a focus on text-to-image (T2I) generative models, we aim to unlock this bottleneck by demonstrating very l…

SEE ALL

HOME
Publications
CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI

JOIN US

Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.

LEARN MORE