Predicting RAG Performance for Text Completion

Tue 18.03 08:30 - 09:00

Abstract: In this presentation, we’ll explore how to predict the effectiveness of using Retrieval-Augmented Generation (RAG) in large language models (LLMs) for text completion tasks. Specifically, we’ll focus on predicting the improvement in perplexity that RAG can provide. To tackle this challenge, we introduce new supervised prediction methods that are tailored to the unique characteristics of text completion. These methods significantly outperform a range of existing prediction approaches originally designed for ad hoc document retrieval. We’ll also highlight how combining our post-retrieval predictors with recently proposed post-generation predictors — which analyze the distribution of the next token — leads to even better results. This combined approach achieves a statistically significant improvement in prediction accuracy compared to using post-generation predictors alone. Finally, we’ll show that our post-retrieval predictors are just as effective as post-generation predictors when it comes to selectively applying RAG. This is a key insight for enhancing the efficiency of selective RAG, making it a promising direction for optimizing LLM performance.

Speaker

Oz Huly

Technion

  • Advisors Oren Kurland

  • Academic Degree M.Sc.