Liberty: an Interventional Benchmark for Concept-Based Explanation Methods in NLP
Thu 03.07 11:00 - 11:45
- Graduate Student Seminar
-
Bloomfield 526
Abstract: Understanding the causal influence of high-level concepts on NLP model decisions is central to interpretability, yet current evaluation practices often fall short in capturing true causal effects. We introduce LIBERTy, a novel benchmark designed to rigorously evaluate concept-based explanation methods under controlled interventions. Unlike earlier resources that rely on simplified tasks and isolated edits, LIBERTy simulates realistic multi-domain scenarios, including CV screening, workplace violence prediction, and disease diagnosis, using LLM-generated texts grounded in complex structured causal graphs. Each dataset is accompanied by counterfactuals that enable fine-grained quantitative assessment of the individual causal concept effect. We evaluate leading explanation techniques, such as Matching, LEACE, SHAP, and counterfactual generation, that explain multiple NLP models, including fine-tuned models and zero-shot LLMs. The results reveal substantial gaps in causal faithfulness across explainers and backbones, and expose persistent causal estimation limitations of explanation methods.

