Understanding and Enhancing Deep Neural Networks with Automated Interpretability

Sun 02.02 10:30 - 11:30

Faculty Seminar
Bloomfield 527

Abstract: Deep neural networks are becoming incredibly sophisticated; they can generate realistic images, engage in complex dialogues, analyze intricate data, and execute tasks that appear almost human-like. But how do such models achieve these abilities? In this talk, I will present a line of work that aims to explain the behaviors of deep neural networks. This includes a new approach for evaluating cross-domain knowledge encoded in generative models, tools for uncovering core mechanisms in large language models, and their behavior under fine-tuning. I will show how to automate and scale the scientific process of interpreting neural networks with the Automated Interpretability Agent, a system that autonomously designs experiments on models’ internal representations to explain their behaviors. I will demonstrate how such understanding enables mitigating biases and enhancing models’ performance. The talk will conclude with a discussion of future directions, including developing universal interpretability tools and extending interpretability methods to automate scientific discovery.

Tamar Rott Shaham

MIT

Short Bio:

Tamar Rott Shaham is a postdoctoral researcher at MIT CSAIL in Prof. Antonio Torralba’s lab. She earned her PhD from the ECE faculty at the Technion, supervised by Prof. Tomer Michaeli. Tamar has received several awards, including the ICCV 2019 Best Paper Award (Marr Prize), the Google WTM Scholarship, the Adobe Research Fellowship, the Rothchild Postdoctoral Fellowship, the Vatat-Zuckerman Postdoctoral Scholarship, and the Schmidt Postdoctoral Award.

Understanding and Enhancing Deep Neural Networks with Automated Interpretability

Sun 02.02 10:30 - 11:30

Speaker

Tamar Rott Shaham