On Estimation over Large Function Classes
יום ראשון 07.12 10:30 - 11:30
- Faculty Seminar
-
Bloomfield 527
Abstract:
In the first part of this talk, we study the statistical performance of Maximum Likelihood Estimation (MLE) and, more generally, Empirical Risk Minimization (ERM) over large function classes. While ERM is known to be minimax optimal for low‑complexity models, classical work (Le Cam; Birgé–Massart) and recent results (Sur–Candès for high‑dimensional logistic regression) show that it can be sharply suboptimal. First, we develop a general framework for detecting and quantifying the suboptimality of ERM in regression over large classes. Second, we show that the variance term of ERM procedures is always upper-bounded by the minimax rate, so any minimax suboptimality must come from bias.
In the second part of this talk, we propose a framework that explains the success of Test-Time Training (TTT) in foundation models, which we primarily validate through experiments with Sparse Autoencoders (SAEs). TTT identifies the ``most similar'' points in the training data to a given evaluation point and improves the model’s prediction by locally adapting it to this selected neighborhood. Although TTT was already discussed in early works (MacKay; Bottou–Vapnik), this approach has only recently been shown to yield significant performance improvements across domains such as control and language modeling.