Research Intern – Multimodal Foundation Model for Vision

Machine Learning, Computer Vision

Internship

Location flexible (Tokyo, Europe, US)

Sony AI is seeking research interns to join us. Our team mainly focuses on fundamental and applied research, with a focus on building next-generation foundation models for vision in a responsible manner. The role of a research intern is to develop efficient and effective methodologies and prototype solutions. You will work with a productive team of world-class scientists and engineers to tackle the most challenging problems in foundation models and generative AI, including low-cost yet powerful vision foundation models (VFM), vision-language models (VLM), unified models, automatic model compression, optimization and deployement on cloud and edge. You will see your ideas not only published in papers, but also improve the experience of billions of customers.

Roles and Responsibilities

Conduct fundamental and innovative development in low-cost yet powerful vision-language models (VLM), unified models, automatic model compression, optimization and deployement on cloud and edge.
Design or implement state-of-the-art techs on model compression, inference speedup, deployement on harwares, tool automation.
PoC for various vision+text, generation relevant tasks (VQA, captioning, understanding, etc) and hardwares.
Contribute to library and tool development to support business; or Publish influential research in top-tier conferences and journals.

Required Qualifications and Skills

Currently has, or is in the process of obtaining, a master/PhD degree in computer science or related field.
Be very self-motivated and capable of proposing and implementing innovative ideas.
Solid presentation and communication skills to internal and external audiences.
Publications or expertise in compact foundation model development and deployment. Influential open-source projects or paper publication at top conferences, e.g., CVPR, ICCV, ECCV, NeurIPS, ICML, ACL, etc.
Better to have front-end development experience.
Solid coding skills in Python, Pytorch, etc

Working Location

Location flexible (Tokyo, Europe, US)

Apply for Job Opening in Japan Apply for Job Opening in US

Related Job Roles

Research Intern on Generative and Protective AI for Content Creation

Machine Learning, Computer Vision
Internship | Location flexible (Tokyo, NYC, remote)

Research Intern – 3D/4D Generation and Perception Foundation Model Research

Machine Learning, Computer Vision
Internship | Location flexible (Tokyo, Europe, US)

Research Intern for Deep Generative Modeling

Machine Learning
Internship | Location flexible (Tokyo, Europe-remote)

SEE ALL