An empirical study of a new approach to nearest neighbor searching

TitleAn empirical study of a new approach to nearest neighbor searching
Publication TypeJournal Articles
Year of Publication2001
AuthorsManeewongvatana S, Mount D
JournalAlgorithm Engineering and Experimentation
Pagination172 - 187
Date Published2001///

In nearest neighbor searching we are given a set of n data points in real d-dimensional space, ℜd, and the problem is to preprocess these points into a data structure, so that given a query point, the nearest data point to the query point can be reported efficiently. Because data sets can be quite large, we are interested in data structures that use optimal O(dn) storage.In this paper we consider a novel approach to nearest neighbor searching, in which the search returns the correct nearest neighbor with a given probability assuming that the queries are drawn from some known distribution. The query distribution is represented by providing a set of training query points at preprocessing time.
The data structure, called the overlapped split tree, is an augmented BSP-tree in which each node is associated with a cover region, which is used to determine whether the search should visit this node. We use principal component analysis and support vector machines to analyze the structure of the data and training points in order to better adapt the tree structure to the data sets. We show empirically that this new approach provides improved predictability over the kd-tree in average query performance.