Two Lenses on Deep Learning: Data Reconstruction and Transformer Structure – Job Talk
יום שני 02.02 11:30 - 12:20
- Faculty Seminar
-
Bloomfield 527
Abstract: Despite the remarkable success of modern deep learning, our theoretical understanding remains limited. Many fundamental questions about how these models learn, what they memorize, and what their architectures can express are still largely open. In this talk, I focus on two such questions that offer complementary perspectives on the behavior of modern networks.
First, I examine how standard training procedures implicitly encode aspects of the training data in the learned parameters, enabling reconstruction across a wide range of architectures and loss functions. Second, I turn to transformers and analyze how architectural choices, such as the number of heads, rank, and depth, shape their expressive capabilities, revealing both strengths and inherent limitations of low-rank attention.
Together, these perspectives highlight recurring principles that shape the behavior of deep models, bringing us closer to a theoretical framework that can explain and predict the phenomena observed in practice.