On Estimation over Large Function Classes

יום ראשון 07.12 10:30 - 11:30

Abstract: In the first part of this talk, we study the statistical performance of Maximum Likelihood Estimation (MLE) and, more generally, Empirical Risk Minimization (ERM) over large function classes. While ERM is known to be minimax optimal for low‑complexity models, classical work (Le Cam; Birgé–Massart) and recent results (Sur–Candès for high‑dimensional logistic regression) show that it can be sharply suboptimal. First, we develop a general framework for detecting and quantifying the suboptimality of ERM in regression over large classes. Second, we show that the variance term of ERM procedures is always upper-bounded by the minimax rate, so any minimax suboptimality must come from bias. In the second part of this talk, we propose a framework that explains the success of Test-Time Training (TTT) in foundation models, which we primarily validate through experiments with Sparse Autoencoders (SAEs). TTT identifies the ``most similar'' points in the training data to a given evaluation point and improves the model’s prediction by locally adapting it to this selected neighborhood. Although TTT was already discussed in early works (MacKay; Bottou–Vapnik), this approach has only recently been shown to yield significant performance improvements across domains such as control and language modeling.

Speaker

Gil Kur

ETH Zürich

Short bio:
Gil Kur is a postdoctoral fellow in the Department of Computer Science at ETH Zürich, hosted by Andreas Krause, Fanny Yang, and Afonso S. Bandeira. He completed his PhD in Electrical Engineering and Computer Science at MIT under the supervision of Sasha Rakhlin and earned an MSc from the Weizmann Institute of Science under Boaz Nadler. His research focuses on statistical learning theory and on nonparametric and high-dimensional statistics.