Ranking continuous probabilistic datasets

TitleRanking continuous probabilistic datasets
Publication TypeJournal Articles
Year of Publication2010
AuthorsLi J, Deshpande A
JournalProc. VLDB Endow.
Pagination638 - 649
Date Published2010/09//
ISBN Number2150-8097

Ranking is a fundamental operation in data analysis and decision support, and plays an even more crucial role if the dataset being explored exhibits uncertainty. This has led to much work in understanding how to rank uncertain datasets in recent years. In this paper, we address the problem of ranking when the tuple scores are uncertain, and the uncertainty is captured using continuous probability distributions (e.g. Gaussian distributions). We present a comprehensive solution to compute the values of a parameterized ranking function (PRF) [18] for arbitrary continuous probability distributions (and thus rank the uncertain dataset); PRF can be used to simulate or approximate many other ranking functions proposed in prior work. We develop exact polynomial time algorithms for some continuous probability distribution classes, and efficient approximation schemes with provable guarantees for arbitrary probability distributions. Our algorithms can also be used for exact or approximate evaluation of k-nearest neighbor queries over uncertain objects, whose positions are modeled using continuous probability distributions. Our experimental evaluation over several datasets illustrates the effectiveness of our approach at efficiently ranking uncertain datasets with continuous attribute uncertainty.