Exploring in the Dark: Pure Exploration for POMDPs

Wed 10.09 16:00 - 16:30

Graduate Student Seminar
Zissapel 506

Abstract: Unsupervised pre-training has achieved remarkable success in NLP and computer vision by leveraging large-scale unlabeled data. In reinforcement learning, such pre-training—often termed pure exploration—aims to learn task-agnostic exploratory policies that can be efficiently fine-tuned for downstream tasks. However, existing pure exploration methods predominantly assume full observability, a limitation that precludes their application to most real-world scenarios. The core challenge lies in defining the exploration objective: while fully observable methods target state-space coverage, partial observability fundamentally obscures the true state space. This ambiguity can lead to degenerate solutions where agents maximize observation diversity through trivial behaviors rather than meaningful exploration. We propose a novel approach that addresses this challenge by learning latent state representations through dynamics modeling , enabling principled exploration in the latent space as a proxy for true state coverage. We demonstrate that our method enables efficient adaptation to sparse-reward POMDP tasks, significantly outperforming baselines that lack structured pre-training.

Exploring in the Dark: Pure Exploration for POMDPs

Wed 10.09 16:00 - 16:30

Speaker

Yonatan Ashlag