Collective entity resolution in relational data

TitleCollective entity resolution in relational data
Publication TypeJournal Articles
Year of Publication2007
AuthorsBhattacharya I, Getoor L
JournalACM Trans. Knowl. Discov. Data
Volume1
Issue1
Date Published2007/03//
ISBN Number1556-4681
Keywordsdata cleaning, entity resolution, graph clustering, record linkage
Abstract

Many databases contain uncertain and imprecise references to real-world entities. The absence of identifiers for the underlying entities often results in a database which contains multiple references to the same entity. This can lead not only to data redundancy, but also inaccuracies in query processing and knowledge extraction. These problems can be alleviated through the use of entity resolution. Entity resolution involves discovering the underlying entities and mapping each database reference to these entities. Traditionally, entities are resolved using pairwise similarity over the attributes of references. However, there is often additional relational information in the data. Specifically, references to different entities may cooccur. In these cases, collective entity resolution, in which entities for cooccurring references are determined jointly rather than independently, can improve entity resolution accuracy. We propose a novel relational clustering algorithm that uses both attribute and relational information for determining the underlying domain entities, and we give an efficient implementation. We investigate the impact that different relational similarity measures have on entity resolution quality. We evaluate our collective entity resolution algorithm on multiple real-world databases. We show that it improves entity resolution performance over both attribute-based baselines and over algorithms that consider relational information but do not resolve entities collectively. In addition, we perform detailed experiments on synthetically generated data to identify data characteristics that favor collective relational resolution over purely attribute-based algorithms.

URLhttp://doi.acm.org/10.1145/1217299.1217304
DOI10.1145/1217299.1217304