Authors
- Junyoung Seo
- Jisang Han
- Jaewoo Jung
- Siyoon Jin
- Joungbin Lee
- Takuya Narihira
- Kazumi Fukuda
- Takashi Shibuya
- Donghoon Ahn
- Shoukang Hu
- Seungryong Kim*
- Yuki Mitsufuji
* External authors
Venue
- AAAI-26
Date
- 2025
Vid-CamEdit: Video Camera Trajectory Editing with Generative Rendering from Estimated Geometry
Junyoung Seo
Jisang Han
Jaewoo Jung
Siyoon Jin
Joungbin Lee
Takuya Narihira
Donghoon Ahn
Seungryong Kim*
* External authors
AAAI-26
2025
Abstract
We introduce Vid-CamEdit, a novel framework for video camera trajectory editing, enabling the re-synthesis of monocular videos along user-defined camera paths. This task is challenging due to its ill-posed nature and the limited multi-view video data for training. Traditional reconstruction methods struggle with extreme trajectory changes, and existing generative models for dynamic novel view synthesis cannot handle in-the-wild videos. Our approach consists of two steps: estimating temporally consistent geometry, and generative rendering guided by this geometry. By integrating geometric priors, the generative model focuses on synthesizing realistic details where the estimated geometry is uncertain. We eliminate the need for extensive 4D training data through a factorized fine-tuning framework that separately trains spatial and temporal components using multi-view image and video data. Our method outperforms baselines in producing plausible videos from novel camera trajectories, especially in extreme extrapolation scenarios on real-world footage.
Related Publications
Classifier-Free Guidance (CFG) is a widely used technique for conditional generation and improving sample quality in continuous diffusion models, and recent works have extended it to discrete diffusion. This paper theoretically analyzes CFG in the context of masked discrete …
We present 3DScenePrompt, a framework that generates the next video chunk from arbitrary-length input while enabling precise camera control and preserving scene consistency. Unlike methods conditioned on a single image or a short clip, we employ dual spatio-temporal conditio…
This paper introduces LLM2Fx-Tools, a multimodal tool-calling framework that generates executable sequences of audio effects (Fx-chain) for music post-production. LLM2Fx-Tools uses a large language model (LLM) to understand audio inputs, select audio effects types, determine…
JOIN US
Shape the Future of AI with Sony AI
We want to hear from those of you who have a strong desire
to shape the future of AI.



