TY - JOUR T1 - A study of unpredictability in fault-tolerant middleware JF - Computer Networks Y1 - 2013 A1 - Tudor Dumitras A1 - Narasimhan, Priya KW - Fault tolerance KW - latency KW - Middleware KW - Remote procedure call KW - Unpredictability AB - In enterprise applications relying on fault-tolerant middleware, it is a common engineering practice to establish service-level agreements (SLAs) based on the 95th or the 99th percentiles of the latency, to allow a margin for unexpected variability. However, the extent of this unpredictability has not been studied systematically. We present an extensive empirical study of unpredictability in 16 distributed systems, ranging from simple transport protocols to fault-tolerant, middleware-based enterprise applications, and we show that the inherent unpredictability in the systems examined arises from at most 1% of the remote invocations. In the normal, fault-free operating mode most remote invocations have a predictable end-to-end latency, but the maximum latency follows unpredictable trends and is comparable with the time needed to recover from a fault. The maximum latency is not influenced by the system’s workload, cannot be regulated through configuration parameters and is not correlated with the system’s resource consumption. The high-latency outliers (up to three orders of magnitude higher than the average latency) have multiple causes and may originate in any component of the system. However, after filtering out 1% of the invocations with the highest recorded response-times, the latency becomes bounded with high statistical confidence (p < 0.01). We have verified this result on different operating systems (Linux 2.4, Linux 2.6, Linux-rt, TimeSys), middleware platforms (CORBA and EJB), programming languages (C, C++ and Java), replication styles (active and warm passive) and applications (e-commerce and online gaming). Moreover, this phenomenon occurs at all the layers of middleware-based systems, from the communication protocols to the business logic. VL - 57 SN - 1389-1286 UR - http://www.sciencedirect.com/science/article/pii/S1389128612003696 CP - 3 J1 - Computer Networks ER - TY - CONF T1 - Prioritizing component compatibility tests via user preferences T2 - Software Maintenance, 2009. ICSM 2009. IEEE International Conference on Y1 - 2009 A1 - Yoon,Il-Chul A1 - Sussman, Alan A1 - Memon, Atif M. A1 - Porter, Adam KW - compatibility testing prioritization KW - component configurations KW - computer clusters KW - Middleware KW - Middleware systems KW - object-oriented programming KW - program testing KW - software engineering KW - Software systems KW - third-party components KW - user preferences AB - Many software systems rely on third-party components during their build process. Because the components are constantly evolving, quality assurance demands that developers perform compatibility testing to ensure that their software systems build correctly over all deployable combinations of component versions, also called configurations. However, large software systems can have many configurations, and compatibility testing is often time and resource constrained. We present a prioritization mechanism that enhances compatibility testing by examining the ldquomost importantrdquo configurations first, while distributing the work over a cluster of computers. We evaluate our new approach on two large scientific middleware systems and examine tradeoffs between the new prioritization approach and a previously developed lowest-cost-configuration-first approach. JA - Software Maintenance, 2009. ICSM 2009. IEEE International Conference on M3 - 10.1109/ICSM.2009.5306357 ER - TY - CONF T1 - Capturing Approximated Data Delivery Tradeoffs T2 - Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on Y1 - 2008 A1 - Roitman,H. A1 - Gal,A. A1 - Raschid, Louiqa KW - approximation scheme KW - approximation theory KW - biobjective optimization problem KW - Middleware KW - middleware data delivery tradeoffs KW - mobile networks KW - Pareto optimisation KW - Pareto set KW - proxy dilemma problem KW - sensor networks AB - This paper presents a middleware data delivery setting with a proxy that is required to maximize the completeness of captured updates, specified in its clients' profiles, while minimizing at the same time the delay in delivering the updates to clients. The two objectives may conflict when the monitoring budget is limited. Therefore, any solution should consider this tradeoff in satisfying both objectives. We term this problem the "proxy dilemma" and formalize it as a biobjective optimization problem. Such problem occurs in many contemporary applications, such as mobile and sensor networks, and poses scalability challenges in delivering up-to-date data from remote resources to meet client specifications. We present a Pareto set as a formal solution to the proxy dilemma. We discuss the complexity of generating a Pareto set for the proxy dilemma and suggest an approximation scheme to this problem. JA - Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on M3 - 10.1109/ICDE.2008.4497593 ER - TY - CONF T1 - Comparing the Performance of High-Level Middleware Systems in Shared and Distributed Memory Parallel Environments T2 - Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International Y1 - 2005 A1 - Kim,Jik-Soo A1 - Andrade,H. A1 - Sussman, Alan KW - Application software KW - Computer science KW - Computer vision KW - Data analysis KW - Distributed computing KW - distributed computing environment KW - distributed memory parallel environment KW - distributed shared memory systems KW - Educational institutions KW - high-level middleware system KW - I/O-intensive data analysis application KW - Libraries KW - Middleware KW - parallel computing environment KW - parallel library support KW - parallel memories KW - programming language KW - programming languages KW - Runtime environment KW - shared memory parallel environment KW - Writing AB - The utilization of toolkits for writing parallel and/or distributed applications has been shown to greatly enhance developer's productivity. Such an approach hides many of the complexities associated with writing these applications, rather than relying solely on programming language aids and parallel library support, such as MPI or PVM. In this work, we evaluate three different middleware systems that have been used to implement a computation and I/O-intensive data analysis application from the domain of computer vision. This study shows the benefits and overheads associated with each of the middleware systems, in different homogeneous computational environments and with different workloads. Our results lead the way toward being able to make better decisions for tuning the application environment, for selecting the appropriate middleware, and also for designing more powerful middleware systems to efficiently build and run highly complex applications in both parallel and distributed computing environments. JA - Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International PB - IEEE SN - 0-7695-2312-9 M3 - 10.1109/IPDPS.2005.144 ER - TY - CONF T1 - Automated cluster-based Web service performance tuning T2 - 13th IEEE International Symposium on High performance Distributed Computing, 2004. Proceedings Y1 - 2004 A1 - Chung,I. -H A1 - Hollingsworth, Jeffrey K KW - Active Harmony system KW - automated performance tuning KW - business KW - cluster-based Web service system KW - Clustering algorithms KW - Computer science KW - Educational institutions KW - electronic commerce KW - Internet KW - Middleware KW - performance evaluation KW - scalability KW - Throughput KW - Transaction databases KW - Web server KW - Web services KW - workstation clusters AB - Active harmony provides a way to automate performance tuning. We apply the Active Harmony system to improve the performance of a cluster-based web service system. The performance improvement cannot easily be achieved by tuning individual components for such a system. The experimental results show that there is no single configuration for the system that performs well for all kinds of workloads. By tuning the parameters, Active Harmony helps the system adapt to different workloads and improve the performance up to 16%. For scalability, we demonstrate how to reduce the time when tuning a large system with many tunable parameters. Finally an algorithm is proposed to automatically adjust the structure of cluster-based web systems, and the system throughput is improved up to 70% using this technique. JA - 13th IEEE International Symposium on High performance Distributed Computing, 2004. Proceedings PB - IEEE SN - 0-7695-2175-4 M3 - 10.1109/HPDC.2004.1323484 ER - TY - JOUR T1 - Formal Modeling Of Middleware-based Distributed Systems JF - Electronic Notes in Theoretical Computer Science Y1 - 2004 A1 - Ray,Arnab A1 - Cleaveland, Rance KW - distributed systems KW - Formal Methods KW - Middleware KW - Software architecture AB - Effective design of middleware-based systems requires modeling notations that allow the use of process-interaction schemes provided by different middleware packages directly in designs. Traditional design notations typically only support a fixed class of interprocess interaction schemes, and designers wishing to use them for modeling middleware-based systems must devote significant effort to encoding the middleware primitives in the notation. In this paper, we demonstrate how a new graphical design notation, Architectural Interaction Diagrams (AIDs), which provides parameterized support for different interaction schemes, may be used to model a real-life middleware-based system like the Event Heap coordination infrastructure of the i-Room ubiquitous computing environment. VL - 108 SN - 1571-0661 UR - http://www.sciencedirect.com/science/article/pii/S1571066104051989 M3 - 10.1016/j.entcs.2004.01.010 ER - TY - JOUR T1 - Preserving distributed systems critical properties: a model-driven approach JF - Software, IEEE Y1 - 2004 A1 - Yilmaz,C. A1 - Memon, Atif M. A1 - Porter, Adam A1 - Krishna,A. S A1 - Schmidt,D. C A1 - Gokhale,A. A1 - Natarajan,B. KW - configuration management KW - formal verification KW - Middleware KW - middleware suite KW - model-driven approach KW - persistent software attributes KW - QoS requirements KW - Quality assurance KW - quality of service KW - quality-of-service KW - Skoll distributed computing resources KW - software configuration KW - Software maintenance KW - Software quality KW - software quality assurance process KW - system dependability AB - The need for creating predictability in distributed systems is most often specified in terms of quality-of-service (QoS) requirements, which help define the acceptable levels of dependability with which capabilities such as processing capacity, data throughput, or service availability reach users. For longer-term properties such as scalability, maintainability, adaptability, and system security, we can similarly use persistent software attributes (PSAs) to specify how and to what degree such properties must remain intact as a network expands and evolves over time. The Skoll distributed continuous software quality assurance process helps to identify viable system and software configurations for meeting stringent QOS and PSA requirements by coordinating the use of distributed computing resources. The authors tested their process using the large, rapidly evolving ACE+TAO middleware suite. VL - 21 SN - 0740-7459 CP - 6 M3 - 10.1109/MS.2004.50 ER - TY - JOUR T1 - An adaptive quality of service aware middleware for replicated services JF - Parallel and Distributed Systems, IEEE Transactions on Y1 - 2003 A1 - Krishnamurthy,Sudha A1 - Sanders,W. H. A1 - Michel Cukier KW - distributed resource sharing KW - Middleware KW - online performance monitoring KW - probabilistic modeling KW - QoS specification KW - quality of service KW - replica consistency KW - replicated services KW - resource allocation KW - time-sensitive client applications KW - timeliness constraints AB - A dependable middleware should be able to adaptively share the distributed resources it manages in order to meet diverse application requirements, even when the quality of service (QoS) is degraded due to uncertain variations in load and unanticipated failures. We have addressed this issue in the context of a dependable middleware that adaptively manages replicated servers to deliver a timely and consistent response to time-sensitive client applications. These applications have specific temporal and consistency requirements, and can tolerate a certain degree of relaxed consistency in exchange for better response time. We propose a flexible QoS model that allows clients to specify their timeliness and consistency constraints. We also propose an adaptive framework that dynamically selects replicas to service a client's request based on the prediction made by probabilistic models. These models use the feedback from online performance monitoring of the replicas to provide probabilistic guarantees for meeting a client's QoS specification. The experimental results we have obtained demonstrate the role of feedback and the efficacy of simple analytical models for adaptively sharing the available replicas among the users under different workload scenarios. VL - 14 SN - 1045-9219 CP - 11 M3 - 10.1109/TPDS.2003.1247672 ER - TY - CONF T1 - Improving access to multi-dimensional self-describing scientific datasets T2 - 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings. CCGrid 2003 Y1 - 2003 A1 - Nam,B. A1 - Sussman, Alan KW - Application software KW - application-specific semantic metadata KW - Bandwidth KW - Computer science KW - database indexing KW - disk I/O bandwidth KW - distributed databases KW - Educational institutions KW - Indexing KW - indexing structures KW - Libraries KW - meta data KW - Middleware KW - multidimensional arrays KW - multidimensional datasets KW - Multidimensional systems KW - NASA KW - NASA remote sensing data KW - Navigation KW - query formulation KW - self-describing scientific data file formats KW - structural metadata KW - very large databases AB - Applications that query into very large multidimensional datasets are becoming more common. Many self-describing scientific data file formats have also emerged, which have structural metadata to help navigate the multi-dimensional arrays that are stored in the files. The files may also contain application-specific semantic metadata. In this paper, we discuss efficient methods for performing searches for subsets of multi-dimensional data objects, using semantic information to build multidimensional indexes, and group data items into properly sized chunks to maximize disk I/O bandwidth. This work is the first step in the design and implementation of a generic indexing library that will work with various high-dimension scientific data file formats containing semantic information about the stored data. To validate the approach, we have implemented indexing structures for NASA remote sensing data stored in the HDF format with a specific schema (HDF-EOS), and show the performance improvements that are gained from indexing the datasets, compared to using the existing HDF library for accessing the data. JA - 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings. CCGrid 2003 PB - IEEE SN - 0-7695-1919-9 M3 - 10.1109/CCGRID.2003.1199366 ER - TY - CONF T1 - Performance evaluation of a probabilistic replica selection algorithm Y1 - 2002 A1 - Krishnamurthy, S. A1 - Sanders,W. H. A1 - Michel Cukier KW - client-server systems KW - Dependability KW - distributed object management KW - dynamic selection algorithm KW - Middleware KW - probabilistic model KW - probabilistic model-based replica selection algorithm KW - probability KW - quality of service KW - real-time systems KW - replica failures KW - round-robin selection scheme KW - static scheme KW - time-sensitive distributed applications KW - timeliness KW - timing failures KW - transient overload AB - When executing time-sensitive distributed applications, a middleware that provides dependability and timeliness is faced with the important problem of preventing timing failures both under normal conditions and when the quality of service is degraded due to replica failures and transient overload on the server. To address this problem, we have designed a probabilistic model-based replica selection algorithm that allows a middleware to choose a set of replicas to service a client based on their ability to meet a client's timeliness requirements. This selection is done based on the prediction made by a probabilistic model that uses the performance history of replicas as inputs. In this paper, we describe the experiments we have conducted to evaluate the ability of this dynamic selection algorithm to meet a client's timing requirements, and compare it with that of a static and round-robin selection scheme under different scenarios M3 - 10.1109/WORDS.2002.1000044 ER - TY - CONF T1 - Integrating distributed scientific data sources with MOCHA and XRoaster T2 - Thirteenth International Conference on Scientific and Statistical Database Management, 2001. SSDBM 2001. Proceedings Y1 - 2001 A1 - Rodriguez-Martinez,M. A1 - Roussopoulos, Nick A1 - McGann,J. M A1 - Kelley,S. A1 - Mokwa,J. A1 - White,B. A1 - Jala,J. KW - client-server systems KW - data sets KW - data sites KW - Databases KW - Distributed computing KW - distributed databases KW - distributed scientific data source integration KW - Educational institutions KW - graphical tool KW - hypermedia markup languages KW - IP networks KW - java KW - Large-scale systems KW - Maintenance engineering KW - meta data KW - metadata KW - Middleware KW - middleware system KW - MOCHA KW - Query processing KW - remote sites KW - scientific information systems KW - user-defined types KW - visual programming KW - XML KW - XML metadata elements KW - XML-based framework KW - XRoaster AB - MOCHA is a novel middleware system for integrating distributed data sources that we have developed at the University of Maryland. MOCHA is based on the idea that the code that implements user-defined types and functions should be automatically deployed to remote sites by the middleware system itself. To this end, we have developed an XML-based framework to specify metadata about data sites, data sets, and user-defined types and functions. XRoaster is a graphical tool that we have developed to help the user create all the XML metadata elements to be used in MOCHA JA - Thirteenth International Conference on Scientific and Statistical Database Management, 2001. SSDBM 2001. Proceedings PB - IEEE SN - 0-7695-1218-6 M3 - 10.1109/SSDM.2001.938560 ER -