Statistics-Powered ML: Reliable Black-Box Inference from Untrusted Data

Tue 25.11 10:30 - 11:30

Abstract   AI systems are increasingly shaping people’s lives, opportunities, and scientific progress. But how can we trust the inferences of such complex, black-box systems? This question becomes even more urgent in the presence of two core challenges that are ubiquitous in high-stakes applications: data scarcity and test-time distribution shift. These issues not only limit the utility of AI systems but can also lead to misleading conclusions and unexpected failures.   In response to these challenges, this talk explores how fundamental statistical principles and modern ML can empower one another to enable trustworthy and practically useful inferences.   The first part focuses on reliable inference under limited data. I’ll introduce a framework that safely enhances the sample efficiency of any statistical inference procedure—such as conformal prediction and hypothesis testing—by adaptively leveraging synthetic data (e.g., from generative models). Crucially, this approach provides distribution-free error control guarantees without imposing any assumptions on the quality of the synthetic data. I'll demonstrate its broad applicability across diverse domains, from reliable protein structure prediction to principled win-rate evaluation of large reasoning models.   The second part enhances model robustness to drifting data. I'll introduce a new approach to test-time training, grounded in sequential statistical testing. Building on conformal betting martingales, I’ll first present a principled monitoring tool to detect data drifts. Using this tool, I’ll derive a rigorous ‘anti-drift correction’ mechanism grounded in (online) optimal transport principles. This mechanism forms the foundation of a self-training scheme that promotes invariance to dynamically changing environments. I'll outline the key ideas and expand on technical details, if time permits.  

Speaker

Yaniv Romano

Technion

Yaniv Romano is an associate professor in the Departments of Electrical and Computer Engineering and Computer Science at the Technion. Previously, he was a postdoctoral scholar in the Department of Statistics at Stanford University. Yaniv holds a PhD, MSc, and BSc in Electrical Engineering, all from the Technion. His super-resolution technology, invented with Peyman Milanfar, has been integrated into Google’s flagship products, including the Pixel phone. His uncertainty quantification technique, developed with Emmanuel Candes, was employed by The Washington Post to estimate outstanding votes during the U.S. presidential election.

Yaniv has received several honors and awards, including the ERC Starting Grant, the SIAG/IS Early Career Prize, the Sheila Samson Prime Minister’s Prize (Researcher Recruitment Prize), the IEEE Signal Processing Society Best Paper Award, the Alon Scholarship, the Krill Prize for Excellence in Scientific Research, and the Henry Taub Prize for Academic Excellence. Yaniv is a member of the Young Israel Academy.