Graph Representation Learning for Data Betterment
Tue 19.08 10:00 - 11:00
- Graduate Student Seminar
-
Taub 337
Abstract: Data integration is a fundamental challenge in data-driven systems, aiming to unify heterogeneous sources into a single, coherent view. This field serves as an umbrella for a wide range of tasks, all of which involve reasoning about how disparate data items relate and how they can be meaningfully combined. In recent years, much attention has been directed toward representation learning, where the goal is to encode data items as vector-space embeddings, using, for example, pretrained language models. In our research, we propose leveraging graph-based structures as a complementary tool to capture and exploit relationships between data items. By doing so, we enable the injection of structural semantics into the representation, enhancing its expressiveness and utility for downstream tasks. We explore this approach across several stages of the data integration pipeline, with a particular focus on entity resolution, the task of identifying records that refer to the same real-world entity, under varying and challenging conditions. Additionally, we extend our framework to the problem of feature selection via graph representation learning in the context of a novel task: subgroup-based feature selection. Our findings, grounded in extensive experimentation, demonstrate the value of structure-aware representation learning for data integration and lay the groundwork for tackling more nuanced data challenges ahead.
