Entity resolution in geospatial data integration

TitleEntity resolution in geospatial data integration
Publication TypeConference Papers
Year of Publication2006
AuthorsSehgal V, Getoor L, Viechnicki PD
Conference NameProceedings of the 14th annual ACM international symposium on Advances in geographic information systems
Date Published2006///
Conference LocationNew York, NY, USA
ISBN Number1-59593-529-0

Due to the growing availability of geospatial data from a wide variety of sources, there is a pressing need for robust, accurate and automatic merging and matching techniques. Geospatial Entity Resolution is the process of determining, from a collection of database sources referring to geospatial locations, a single consolidated collection of 'true' locations. At the heart of this process is the problem of determining when two locations references match---i.e., when they refer to the same underlying location. In this paper, we introduce a novel method for resolving location entities in geospatial data. A typical geospatial database contains heterogeneous features such as location name, spatial coordinates, location type and demographic information. We investigate the use of all of these features in algorithms for geospatial entity resolution. Entity resolution is further complicated by the fact that the different sources may use different vocabularies for describing the location types and a semantic mapping is required. We propose a novel approach which learns how to combine the different features to perform accurate resolutions. We present experimental results showing that methods combining spatial and non-spatial features (e.g., location-name, location-type, etc.) together outperform methods based on spatial or name information alone.