Context Assisted Matching

Tue 19.08 11:45 - 12:15

Abstract: Entity matching (EM) is a well-studied problem, crucial for data integration and preparation. It can be positioned as a binary classification task -- given two data items, usually records from a table, one must decide if they refer to the same real-world entity. Variations of the EM problem can be performed over non tabular data, and with criteria other than addressing the same real-world entity. In this thesis, we focus on settings where entities can be described as a predicate of the data. For example, in an academic research domain, a paper pair can be matched if both papers belong to the same scientific field. In addition, we tackle the challenge of deriving representation from long textual data, where the cost associated with processing and subsequently representing the data may be significant. This setting of predicate-based matching of long texts presents a challenge where representing and subsequently matching data items must rely on underlying properties of the data, while still maintaining a computationally feasible approach. In this work we present a model and an end-to-end pipeline for managing this task, including a representation algorithm and a unique matching approach to provide an effective representation in an efficient manner.

Speaker

Noam Zel

Technion

  • Advisors Avigdor Gal

  • Academic Degree M.Sc.