Authors

Venue

Date

Share

CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI

Siyuan Cheng

Lingjuan Lyu

Zhenting Wang

Xiangyu Zhang

Vikash Sehwag

CVPR-25

2025

Abstract

With the rapid advancement of generative AI, it is now pos-
sible to synthesize high-quality images in a few seconds.
Despite the power of these technologies, they raise signif-
icant concerns regarding misuse. Current efforts to dis-
tinguish between real and AI-generated images may lack
generalization, being effective for only certain types of gen-
erative models and susceptible to post-processing techniques
like JPEG compression. To overcome these limitations, we
propose a novel framework, CO-SPY, that first enhances
existing semantic features (e.g., the number of fingers in a
hand) and artifact features (e.g., pixel value differences),
and then adaptively integrates them to achieve more general
and robust synthetic image detection. Additionally, we create
CO-SPYBENCH, a comprehensive dataset comprising 5 real
image datasets and 22 state-of-the-art generative models,
including the latest models like FLUX. We also collect 50k
synthetic images in the wild from the Internet to enable eval-
uation in a more practical setting. Our extensive evaluations
demonstrate that our detector outperforms existing methods
under identical training conditions, achieving an average
accuracy improvement of approximately 11% to 34%. The
code is available at https://github.com/Megum1/Co-Spy.

Related Publications

Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models

CVPR, 2025
Jie Ren, Kangrui Chen, Yingqian Cui, Shenglai Zeng, Hui Liu, Yue Xing, Jiliang Tang, Lingjuan Lyu

Text-to-image (T2I) diffusion models have shown exceptional capabilities in generating images that closely correspond to textual prompts. However, the advancement of T2I diffusion models presents significant risks, as the models could be exploited for malicious purposes, suc…

Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget

CVPR, 2025
Vikash Sehwag, Xianghao Kong, Jingtao Li, Michael Spranger, Lingjuan Lyu

As scaling laws in generative AI push performance, they simultaneously concentrate the development of these models among actors with large computational resources. With a focus on text-to-image (T2I) generative models, we aim to unlock this bottleneck by demonstrating very l…

Argus: A Compact and Versatile Foundation Model for Vision

CVPR, 2025
Weiming Zhuang, Chen Chen, Zhizhong Li, Sina Sajadmanesh, Jingtao Li, Jiabo Huang, Vikash Sehwag, Vivek Sharma, Hirotaka Shinozaki, Felan Carlo Garcia, Yihao Zhan, Naohiro Adachi, Ryoji Eki, Michael Spranger, Peter Stone, Lingjuan Lyu

While existing vision and multi-modal foundation models can handle multiple computer vision tasks, they often suffer from significant limitations, including huge demand for data and computational resources during training and inconsistent performance across vision tasks at d…

  • HOME
  • Publications
  • CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI

JOIN US

Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.