Toward Generative Models that Understand the Visual World
יום ראשון 30.03 10:30 - 11:20
- Faculty Seminar
-
Bloomfield 527
Abstract:
Despite remarkable advances, visual generative models are still far from faithfully modeling the world, struggling with fundamental aspects such as spatial relations, physics, motion, and dynamic interactions.
In this talk, I present a line of work that tackles these challenges, based on a deep understanding of the inner mechanisms that drive models. I will begin by analyzing state-of-the-art visual generators, gaining insights into the underlying reasons for their limited understanding. Building upon these insights, I will demonstrate methods that significantly enhance both spatial and temporal reasoning in image and video generation, surpassing even resource-intensive proprietary models without relying on additional data or model scaling. I will conclude the talk by discussing open challenges and future directions for advancing faithful world modeling in visual generative models.