Efficient Serial and Parallel Algorithms for Querying Large Scale Multidimensional Time Series Data

TitleEfficient Serial and Parallel Algorithms for Querying Large Scale Multidimensional Time Series Data
Publication TypeReports
Year of Publication2004
AuthorsJaJa JF, Kim J, Wang Q
Date Published2004///
InstitutionInstititue for Advanced Computer Studies, Univ of Maryland, College Park
Abstract

We consider the problem of querying large scale multidimensional time series data to discover events of interest,test and validate hypotheses, or to associate temporal patterns with specific events. Multidimensional time series
data is growing at an extremely fast rate due to a number of trends including a recent strong interest in collecting
and analyzing time series of business, scientific, demographic, and simulation data. The ability to explore such
collections interactively, even at a coarse level, will be critical to the process of extracting some of the information
and knowledge embedded in such collections. We develop indexing techniques and search algorithms to efficiently
handle temporal range value querying of multidimensional time series data. Our indexing uses linear space data
structures that enable the handling of queries very efficiently, invoking in the worst case a logarithmic number
of queries to single time steps. We also show that our algorithm is ideally suited for parallel implementation on
clusters of processors achieving a linear speedup in the number of available processors. A particularly simple data
structure with provably good bounds is also presented for the case when the number of multidimensional objects is
relatively small. These techniques improve significantly over previous techniques for either the serial or the parallel
case, and are evaluated by extensive experimental results that confirm their superior performance. In particular, we
achieve query times in the order of hundreds of milliseconds on a (relatively outdated) cluster of 16 processors
for 140GB of data consisting of 160, 000 distinct time series of 16-dimensional points, each time series being of
length 10, 000.