Ef?cient Query Evaluation over Temporally Correlated Probabilistic Streams

TitleEf?cient Query Evaluation over Temporally Correlated Probabilistic Streams
Publication TypeConference Papers
Year of Publication2009
AuthorsKanagal B, Deshpande A
Conference NameIEEE 25th International Conference on Data Engineering, 2009. ICDE '09
Date Published2009/04/29/March
ISBN Number978-1-4244-3422-0
KeywordsBirds, Computerized monitoring, correlated probabilistic streams, correlation structure, Data engineering, data mining, Databases, Event detection, Graphical models, inference mechanisms, Markov processes, polynomial time, probabilistic database, probabilistic graphical model, probabilistic query evaluation, query evaluation, query planning algorithm, query plans, Query processing, Random variables, stream processing operator, Streaming media

In this paper, we address the problem of efficient query evaluation over highly correlated probabilistic streams. We observe that although probabilistic streams tend to be strongly correlated in space and time, the correlations are usually quite structured (i.e., the same set of dependencies and independences repeat across time) and Markovian (i.e., the state at time "t+1" is independent of the states at previous times given the state at time "t"). We exploit this observation to compactly encode probabilistic streams by decoupling the correlation structure (the set of dependencies) from the actual probability values. We develop novel stream processing operators that can efficiently and incrementally process new data items; our operators are based on the previously proposed framework of viewing probabilistic query evaluation as inference over probabilistic graphical models (PGMs) [P. Sen and A. Deshpande, 2007]. We develop a query planning algorithm that constructs efficient query plans that are executable in polynomial-time whenever possible, and we characterize queries for which such plans are not possible. Finally we conduct an extensive experimental evaluation that illustrates the advantages of exploiting the structured nature of correlations in probabilistic streams.