TY - CONF T1 - Decentralized dynamic scheduling across heterogeneous multi-core desktop grids T2 - 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW) Y1 - 2010 A1 - Jaehwan Lee A1 - Keleher,P. A1 - Sussman, Alan KW - backfill jobs KW - bounded waiting time KW - Computer science KW - decentralized dynamic scheduling KW - desktop grid resource management KW - Dynamic scheduling KW - Educational institutions KW - Environmental management KW - grid computing KW - heterogeneous multicore desktop grid KW - job assignment KW - job migration KW - load balancing KW - Load management KW - multicore computing environment KW - Peer to peer computing KW - Processor scheduling KW - residual resources KW - resource allocation KW - Resource management KW - scheduling KW - Scheduling algorithm KW - Throughput AB - The recent advent of multi-core computing environments increases both the heterogeneity and complexity of managing desktop grid resources, making efficient load balancing challenging even for a centralized manager. Even with good initial job assignments, dynamic scheduling is still needed to adapt to dynamic environments, as well as for applications whose running times are not known a priori. In this paper, we propose new decentralized scheduling schemes that backfill jobs locally and dynamically migrate waiting jobs across nodes to leverage residual resources, while guaranteeing bounded waiting times for all jobs. The methods attempt to maximize total throughput while balancing load across available grid resources. Experimental results via simulation show that our scheduling scheme has performance competitive with an online centralized scheduler. JA - 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW) PB - IEEE SN - 978-1-4244-6533-0 M3 - 10.1109/IPDPSW.2010.5470877 ER - TY - CONF T1 - Lazy binary-splitting: a run-time adaptive work-stealing scheduler T2 - Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Y1 - 2010 A1 - Tzannes,Alexandros A1 - Caragea,George C. A1 - Barua,Rajeev A1 - Vishkin, Uzi KW - Dynamic scheduling KW - load balancing KW - nested parallelism KW - thread scheduling KW - work stealing AB - We present Lazy Binary Splitting (LBS), a user-level scheduler of nested parallelism for shared-memory multiprocessors that builds on existing Eager Binary Splitting work-stealing (EBS) implemented in Intel's Threading Building Blocks (TBB), but improves performance and ease-of-programming. In its simplest form (SP), EBS requires manual tuning by repeatedly running the application under carefully controlled conditions to determine a stop-splitting-threshold (sst)for every do-all loop in the code. This threshold limits the parallelism and prevents excessive overheads for fine-grain parallelism. Besides being tedious, this tuning also over-fits the code to some particular dataset, platform and calling context of the do-all loop, resulting in poor performance portability for the code. LBS overcomes both the performance portability and ease-of-programming pitfalls of a manually fixed threshold by adapting dynamically to run-time conditions without requiring tuning. We compare LBS to Auto-Partitioner (AP), the latest default scheduler of TBB, which does not require manual tuning either but lacks context portability, and outperform it by 38.9% using TBB's default AP configuration, and by 16.2% after we tuned AP to our experimental platform. We also compare LBS to SP by manually finding SP's sst using a training dataset and then running both on a different execution dataset. LBS outperforms SP by 19.5% on average. while allowing for improved performance portability without requiring tedious manual tuning. LBS also outperforms SP with sst=1, its default value when undefined, by 56.7%, and serializing work-stealing (SWS), another work-stealer by 54.7%. Finally, compared to serializing inner parallelism (SI) which has been used by OpenMP, LBS is 54.2% faster. JA - Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming T3 - PPoPP '10 PB - ACM CY - New York, NY, USA SN - 978-1-60558-877-3 UR - http://doi.acm.org/10.1145/1693453.1693479 M3 - 10.1145/1693453.1693479 ER - TY - JOUR T1 - Trade-offs in matching jobs and balancing load for distributed desktop grids JF - Future Generation Computer Systems Y1 - 2008 A1 - Kim,Jik-Soo A1 - Nam,Beomseok A1 - Keleher,Peter A1 - Marsh,Michael A1 - Bhattacharjee, Bobby A1 - Sussman, Alan KW - Desktop grid KW - load balancing KW - Matchmaking KW - peer-to-peer computing KW - Resource discovery AB - Desktop grids can achieve tremendous computing power at low cost through opportunistic sharing of resources. However, traditional client–server Grid architectures do not deal with all types of failures, and do not always cope well with very dynamic environments. This paper describes the design of a desktop grid implemented over a modified Peer-to-Peer (P2P) architecture. The underlying P2P system is decentralized and inherently adaptable, giving the Grid robustness, scalability, and the ability to cope with dynamic environments, while still efficiently mapping application instances to available resources throughout the system.We use simulation to compare three different types of matching algorithms under differing workloads. Overall, the P2P approach produces significantly lower wait times than prior approaches, while adapting efficiently to the dynamic environment. VL - 24 SN - 0167-739X UR - http://www.sciencedirect.com/science/article/pii/S0167739X07001240 CP - 5 M3 - 10.1016/j.future.2007.07.007 ER - TY - CONF T1 - Efficient Manipulation of Large Datasets on Heterogeneous Storage Systems T2 - Parallel and Distributed Processing Symposium, International Y1 - 2002 A1 - Beynon,Michael D. A1 - Sussman, Alan A1 - Kurc,Tahsin A1 - Catalyurek,Umit A1 - Saltz,Joel KW - component-based frameworks KW - data-intensive computing KW - load balancing AB - In this paper we are concerned with the efficient use of a collection of disk-based storage systems and computing platforms in a heterogeneous setting for retrieving and processing large scientific datasets. We demonstrate, in the context of a data-intensive visualization application, how heterogeneity affects performance and show a set of optimization techniques that can be used to improve performance in a component-based framework. In particular, we examine the application of parallelism via transparent copies of application components in the pipelined processing of data. JA - Parallel and Distributed Processing Symposium, International PB - IEEE Computer Society CY - Los Alamitos, CA, USA VL - 2 SN - 0-7695-1573-8 M3 - http://doi.ieeecomputersociety.org/10.1109/IPDPS.2002.1015655 ER - TY - JOUR T1 - The design and evaluation of a high-performance earth science database JF - Parallel Computing Y1 - 1998 A1 - Shock,Carter T. A1 - Chang,Chialin A1 - Moon,Bongki A1 - Acharya,Anurag A1 - Davis, Larry S. A1 - Saltz,Joel A1 - Sussman, Alan KW - Communication KW - High-performance I/O KW - load balancing KW - scalability KW - Scientific databases AB - Earth scientists have encountered two major obstacles in their attempts to use remotely sensed imagery to analyze the earth's land cover dynamics. First, the volume of data involved is very large and second, significant preprocessing is needed before the data can be used. This is particularly so for studies that analyze global trends using data sets that cover multiple years. In this paper, we present the design of an earth science database as well as our early experiences with it. The primary design goal of this database is to facilitate efficient access to and preprocessing of large volumes of satellite data. Our initial design assumed that the main bottleneck in the system would be retrieving data from the disks. However, experimental results show that precise identification of all the data values corresponding to a query can take a significant amount of time. The problem is even more pronounced in designing the system to attempt to minimize time spent performing I/O. We therefore discuss a major redesign of the system that includes a reworking of the indexing scheme and a reorganization of the data on disks. Experimental results show that the redesigned system performs significantly better than the original system, providing interactive response times for local queries. VL - 24 SN - 0167-8191 UR - http://www.sciencedirect.com/science/article/pii/S0167819197001178 CP - 1 M3 - 10.1016/S0167-8191(97)00117-8 ER -