Processing large-scale multi-dimensional data in parallel and distributed environments

Title	Processing large-scale multi-dimensional data in parallel and distributed environments
Publication Type	Journal Articles
Year of Publication	2002
Authors	Beynon M, Chang C, Catalyurek U, Kurc T, Sussman A, Andrade H, Ferreira R, Saltz J
Journal	Parallel Computing
Volume	28
Issue	5
Pagination	827 - 859
Date Published	2002/05//
ISBN Number	0167-8191
Keywords	Data-intensive applications, Distributed computing, Multi-dimensional datasets, PARALLEL PROCESSING, Runtime systems
Abstract	Analysis of data is an important step in understanding and solving a scientific problem. Analysis involves extracting the data of interest from all the available raw data in a dataset and processing it into a data product. However, in many areas of science and engineering, a scientist's ability to analyze information is increasingly becoming hindered by dataset sizes. The vast amount of data in scientific datasets makes it a difficult task to efficiently access the data of interest, and manage potentially heterogeneous system resources to process the data. Subsetting and aggregation are common operations executed in a wide range of data-intensive applications. We argue that common runtime and programming support can be developed for applications that query and manipulate large datasets. This paper presents a compendium of frameworks and methods we have developed to support efficient execution of subsetting and aggregation operations in applications that query and manipulate large, multi-dimensional datasets in parallel and distributed computing environments.
URL	http://www.sciencedirect.com/science/article/pii/S0167819102000972
DOI	10.1016/S0167-8191(02)00097-2

Processing large-scale multi-dimensional data in parallel and distributed environments

Publications