TY - JOUR T1 - AQuA: an adaptive architecture that provides dependable distributed objects JF - Computers, IEEE Transactions on Y1 - 2003 A1 - Ren,Yansong A1 - Bakken,D. E. A1 - Courtney,T. A1 - Michel Cukier A1 - Karr,D. A. A1 - Rubel,P. A1 - Sabnis,C. A1 - Sanders,W. H. A1 - Schantz,R.E. A1 - Seri,M. KW - active replication pass-first scheme KW - adaptive architecture KW - adaptive fault tolerance KW - AQuA KW - CORBA KW - data consistency KW - data integrity KW - dependable distributed objects KW - distributed object management KW - performance measurements KW - quality of service KW - replicated dependability manager KW - replication schemes KW - software fault tolerance KW - system resources AB - Building dependable distributed systems from commercial off-the-shelf components is of growing practical importance. For both cost and production reasons, there is interest in approaches and architectures that facilitate building such systems. The AQuA architecture is one such approach; its goal is to provide adaptive fault tolerance to CORBA applications by replicating objects. The AQuA architecture allows application programmers to request desired levels of dependability during applications' runtimes. It provides fault tolerance mechanisms to ensure that a CORBA client can always obtain reliable services, even if the CORBA server object that provides the desired services suffers from crash failures and value faults. AQuA includes a replicated dependability manager that provides dependability management by configuring the system in response to applications' requests and changes in system resources due to faults. It uses Maestro/Ensemble to provide group communication services. It contains a gateway to intercept standard CORBA IIOP messages to allow any standard CORBA application to use AQuA. It provides different types of replication schemes to forward messages reliably to the remote replicated objects. All of the replication schemes ensure strong, data consistency among replicas. This paper describes the AQuA architecture and presents, in detail, the active replication pass-first scheme. In addition, the interface to the dependability manager and the design of the dependability manager replication are also described. Finally, we describe performance measurements that were conducted for the active replication pass-first scheme, and we present results from our study of fault detection, recovery, and blocking times. VL - 52 SN - 0018-9340 CP - 1 M3 - 10.1109/TC.2003.1159752 ER - TY - JOUR T1 - An adaptive algorithm for tolerating value faults and crash failures JF - Parallel and Distributed Systems, IEEE Transactions on Y1 - 2001 A1 - Ren,Yansong A1 - Michel Cukier A1 - Sanders,W. H. KW - active replication communication KW - adaptive algorithm KW - adaptive fault tolerance KW - adaptive majority voting algorithm KW - AQuA architecture KW - client-server systems KW - CORBA KW - crash failures KW - data consistency KW - data integrity KW - Dependability KW - distributed object management KW - fault tolerant computing KW - objects replication KW - value faults AB - The AQuA architecture provides adaptive fault tolerance to CORBA applications by replicating objects and providing a high-level method that an application can use to specify its desired level of dependability. This paper presents the algorithms that AQUA uses, when an application's dependability requirements can change at runtime, to tolerate both value faults in applications and crash failures simultaneously. In particular, we provide an active replication communication scheme that maintains data consistency among replicas, detects crash failures, collates the messages generated by replicated objects, and delivers the result of each vote. We also present an adaptive majority voting algorithm that enables the correct ongoing vote while both the number of replicas and the majority size dynamically change. Together, these two algorithms form the basis of the mechanism for tolerating and recovering from value faults and crash failures in AQuA VL - 12 SN - 1045-9219 CP - 2 M3 - 10.1109/71.910872 ER - TY - CONF T1 - Proteus: a flexible infrastructure to implement adaptive fault tolerance in AQuA Y1 - 1999 A1 - Sabnis,C. A1 - Michel Cukier A1 - Ren,J. A1 - Rubel,P. A1 - Sanders,W. H. A1 - Bakken,D. E. A1 - Karr,D. KW - adaptive fault tolerance KW - AQuA KW - commercial off-the-shelf components KW - CORBA applications KW - cost KW - dependable distributed systems KW - distributed object management KW - object replication KW - proteus KW - reconfigurable architectures KW - Runtime KW - Software architecture KW - software fault tolerance AB - Building dependable distributed systems from commercial off-the-shelf components is of growing practical importance. For both cost and production reasons, there is interest in approaches and architectures that facilitate building such systems. The AQuA architecture is one such approach; its goal is to provide adaptive fault tolerance to CORBA applications by replicating objects, providing a high-level method for applications to specify their desired dependability, and providing a dependability manager that attempts to reconfigure a system at runtime so that dependability requests are satisfied. This paper describes how dependability is provided in AQuA. In particular it describes Proteus, the part of AQuA that dynamically manages replicated distributed objects to make them dependable. Given a dependability request, Proteus chooses a fault tolerance approach and reconfigures the system to try to meet the request. The infrastructure of Proteus is described in this paper, along with its use in implementing active replication and a simple dependability policy M3 - 10.1109/DCFTS.1999.814294 ER -