%0 Book Section %B Middleware 2005 %D 2005 %T Fault-Tolerant Middleware and the Magical 1% %A Tudor Dumitras %A Narasimhan, Priya %E Alonso, Gustavo %K Computer Communication Networks %K Information Systems Applications (incl.Internet) %K Operating systems %K Programming Languages, Compilers, Interpreters %K Programming Techniques %K software engineering %X Through an extensive experimental analysis of over 900 possible configurations of a fault-tolerant middleware system, we present empirical evidence that the unpredictability inherent in such systems arises from merely 1% of the remote invocations. The occurrence of very high latencies cannot be regulated through parameters such as the number of clients, the replication style and degree or the request rates. However, by selectively filtering out a “magical 1%” of the raw observations of various metrics, we show that performance, in terms of measured end-to-end latency and throughput, can be bounded, easy to understand and control. This simple statistical technique enables us to guarantee, with some level of confidence, bounds for percentile-based quality of service (QoS) metrics, which dramatically increase our ability to tune and control a middleware system in a predictable manner. %B Middleware 2005 %S Lecture Notes in Computer Science %I Springer Berlin Heidelberg %P 431 - 441 %8 2005/01/01/ %@ 978-3-540-30323-7, 978-3-540-32269-6 %G eng %U http://link.springer.com/chapter/10.1007/11587552_24