%0 Conference Paper %B OOPSLA'09 Proceedings of the 24th ACM SIGPLAN Conference Companion on Object Oriented Programming Systems Languages and Applications %D 2009 %T Dependable, Online Upgrades in Enterprise Systems %A Tudor Dumitras %K data migration %K Dependability %K hidden dependencies %K online upgrades %K software upgrades %X Software upgrades are unreliable, often causing downtime or data loss. I propose Imago, an approach for removing the leading causes of upgrade failures (broken dependencies) and of planned downtime (data migrations). While imposing a higher resource overhead than previous techniques, Imago is more dependable and easier to use correctly. %B OOPSLA'09 Proceedings of the 24th ACM SIGPLAN Conference Companion on Object Oriented Programming Systems Languages and Applications %S OOPSLA '09 %I ACM %P 743 - 744 %8 2009/// %@ 978-1-60558-768-4 %G eng %U http://doi.acm.org/10.1145/1639950.1639993 %0 Conference Paper %D 2002 %T Performance evaluation of a probabilistic replica selection algorithm %A Krishnamurthy, S. %A Sanders,W. H. %A Michel Cukier %K client-server systems %K Dependability %K distributed object management %K dynamic selection algorithm %K Middleware %K probabilistic model %K probabilistic model-based replica selection algorithm %K probability %K quality of service %K real-time systems %K replica failures %K round-robin selection scheme %K static scheme %K time-sensitive distributed applications %K timeliness %K timing failures %K transient overload %X When executing time-sensitive distributed applications, a middleware that provides dependability and timeliness is faced with the important problem of preventing timing failures both under normal conditions and when the quality of service is degraded due to replica failures and transient overload on the server. To address this problem, we have designed a probabilistic model-based replica selection algorithm that allows a middleware to choose a set of replicas to service a client based on their ability to meet a client's timeliness requirements. This selection is done based on the prediction made by a probabilistic model that uses the performance history of replicas as inputs. In this paper, we describe the experiments we have conducted to evaluate the ability of this dynamic selection algorithm to meet a client's timing requirements, and compare it with that of a static and round-robin selection scheme under different scenarios %P 119 - 127 %8 2002/// %G eng %R 10.1109/WORDS.2002.1000044 %0 Journal Article %J Parallel and Distributed Systems, IEEE Transactions on %D 2001 %T An adaptive algorithm for tolerating value faults and crash failures %A Ren,Yansong %A Michel Cukier %A Sanders,W. H. %K active replication communication %K adaptive algorithm %K adaptive fault tolerance %K adaptive majority voting algorithm %K AQuA architecture %K client-server systems %K CORBA %K crash failures %K data consistency %K data integrity %K Dependability %K distributed object management %K fault tolerant computing %K objects replication %K value faults %X The AQuA architecture provides adaptive fault tolerance to CORBA applications by replicating objects and providing a high-level method that an application can use to specify its desired level of dependability. This paper presents the algorithms that AQUA uses, when an application's dependability requirements can change at runtime, to tolerate both value faults in applications and crash failures simultaneously. In particular, we provide an active replication communication scheme that maintains data consistency among replicas, detects crash failures, collates the messages generated by replicated objects, and delivers the result of each vote. We also present an adaptive majority voting algorithm that enables the correct ongoing vote while both the number of replicas and the majority size dynamically change. Together, these two algorithms form the basis of the mechanism for tolerating and recovering from value faults and crash failures in AQuA %B Parallel and Distributed Systems, IEEE Transactions on %V 12 %P 173 - 192 %8 2001/02// %@ 1045-9219 %G eng %N 2 %R 10.1109/71.910872