Vision and Sensing
Our research aims to revolutionize how vision AI is deployed and applied in the real world, by developing compact, high-performance foundation models and robust imaging pipelines.
Reimagining Vision AI
Building compact foundation models and robust pipelines for vision and sensing.
Sony AI is exploring cutting-edge vision technologies to maximize vision and sensing capabilities using AI. This includes a compact and powerful vision foundation model series for vision, model compression, and optimization that will help shape the future of edge AI. At the same time, it creates an optimal AI pipeline design for both imaging and sensing to leverage the latest advancements in computer vision and machine learning methods.
Breakthroughs in Vision & Sensing
Learn about our global initiatives unlocking complex application environments and new business opportunities. See how these projects push the boundaries of vision and sensing.
OUR WORK
Argus: A Compact and Versatile Vision Foundation Model
Argus is a powerful vision foundation model designed to handle 12 different computer vision tasks—from smart cities to retail—on both cloud and edge devices. By training on just a fraction of the usual data, we’ve created a high-performance backbone that is versatile, practical, and significantly more accessible to deploy.
OUR WORK
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
We’ve developed a strategy to democratize high-end generative AI by training a 1.1 billion parameter model for under $2,000—over 100x less than traditional stable diffusion costs. Using smart masking and synthetic data, we achieve high-quality image generation while proving that top-tier research doesn't always require massive computational budgets.
OUR WORK
Beyond RGB: Adaptive Parallel Processing for RAW Object Detection
Traditional image processing is built for human eyes, but it often throws away data that AI needs to see clearly. Our Raw Adaptation Module (RAM) mimics the human visual system’s parallel processing to extract more detail directly from raw sensor data, leading to state-of-the-art object detection even in difficult lighting and high-dynamic-range environments.
Deep Dive Into Our Research
Discover how we are leveraging breakthroughs in computer vision and machine learning to unlock new business opportunities and navigate complex application environments.
AI Pipeline
Our goal is to maximize imaging and sensing capabilities using AI. Traditional RGB pipelines are optimized for human perception, and often discard information that could be critical for specific applications, such as scientific analysis or autonomous navigation. We are creating AI-optimized pipelines for computer vision.
Vision Foundation Models
Our compact vision foundation models will help shape the future of edge AI, especially given the limited memory and computational capabilities of edge devices. These models provide numerous advantages beyond just deployment flexibility. They require fewer compute resources, enable faster inference speeds, and contribute to smaller carbon footprints.
Awards & Recognitions
Recent Publications
Our team contributes to the global scientific discourse through peer-reviewed journals and technical papers. Explore methodologies and data-driven insights behind our advancements.
Beyond RGB
Adaptive Parallel Processing for RAW Object Detection
Noise Modeling in One Hour
Minimizing Preparation Efforts for Self-supervised Low-Light RAW Image Denoising
Stretching Each Dollar
Diffusion Training from Scratch on a Micro-Budget
Argus
A Compact and Versatile Vision Foundation Model
Latest Updates from Sony AI
Explore our latest resources to stay informed on our progress in advancing innovation around vision and sensing.
Video
COALA: A Practical and Vision-Centric Federated Learning Platform
Blog
Sights on AI: Lingjuan Lyu Discusses Her Career in Privacy-Preserving AI and Staying Inspired As The AI Landscape Has Advanced
Blog
Research That Scales, Adapts, and Creates: Spotlighting Sony AI at CVPR 2025
Blog
Sights on AI: Expanding Human Perception Through AI – A Conversation with Daisuke Iso
Blog
Unleashing the Potential of Federated Learning with COALA