Assessing Policy Updates: Toward Intelligent User Interfaces

Sun 17.05 12:30 - 13:00

Graduate Student Seminar
Bloomfield 526

Abstract: Reinforcement learning agents are often updated with human feedback, yet such updates can be unreliable. Reward misspecification, preference conflicts, or limited data may leave policies unchanged or even worse. Because policies are difficult to interpret directly, users face the challenge of deciding whether an update has truly helped. We propose an Assessment Loop in which, following each update, users are presented with comparative demonstrations of the original and updated policies and decide whether to accept or reject the change before it takes effect. We conducted two controlled studies in qualitatively different environments, a custom Gridworld and the FruitBot arcade benchmark, in which participants provided feedback to an agent and then compared its original and updated policies under one of four demonstration strategies: no demonstration, random-context, same-context, and Salient- ContRast demonstrations designed to highlight informative behavioral differences. Across both environments, Salient-ContRast demonstrations significantly improved participants’ ability to detect whether updates had improved the agent’s behavior across novel contexts, with the advantage concentrated on updates that had degraded performance. Same-context demonstrations complementarily supported more accurate local verification of specific corrections. Salient-ContRast and same-context demonstrations both led to higher-performing final agents than the no-demonstration and random-context baselines, positioning demonstration context as a central design lever for interactive systems that learn from human feedback.

Assessing Policy Updates: Toward Intelligent User Interfaces

Sun 17.05 12:30 - 13:00

Speaker

Matan-Itamar Solomon