Between the Layers Lies the Truth: Uncertainty Estimation in LLMs using Intra-Layer Local Information Scores
Wed 27.08 13:00 - 13:30
- Graduate Student Seminar
-
BME 201, Faculty of Biomedical Engineering
Abstract: As large language models (LLMs) are increasingly deployed in high-risk and knowledge-intensive settings, the challenge of hallucinations becomes critical. Current uncertainty estimation methods are often extrinsic (e.g., RAG) or expensive (e.g., Bayesian approaches like MC Dropout), and may rely on external sources or multiple forward passes. We propose a novel intrinsic uncertainty estimation method grounded in information-theoretic principles, leveraging KL divergence between SoftMax-normalized internal representations across transformer layers. Our approach sits at the intersection of probing techniques and the Information Bottleneck framework, aiming to quantify internal confidence signals without external data or architectural modifications. We benchmark our method against prior intrinsic approaches like probing-based token confidence estimation, and extrinsic methods like P(True), demonstrating competitive or superior performance on uncertainty calibration and error detection tasks. Our method is architecture-agnostic and can be extended to vision models (e.g., CNNs, ViTs) and non-generative tasks (e.g., classification, segmentation), paving the way for broader and more efficient uncertainty-aware AI systems.
