Fault-Tolerant Middleware and the Magical 1%

TitleFault-Tolerant Middleware and the Magical 1%
Publication TypeBook Chapters
Year of Publication2005
AuthorsDumitras T, Narasimhan P
EditorAlonso G
Book TitleMiddleware 2005
Series TitleLecture Notes in Computer Science
Pagination431 - 441
PublisherSpringer Berlin Heidelberg
ISBN Number978-3-540-30323-7, 978-3-540-32269-6
KeywordsComputer Communication Networks, Information Systems Applications (incl.Internet), Operating systems, Programming Languages, Compilers, Interpreters, Programming Techniques, software engineering

Through an extensive experimental analysis of over 900 possible configurations of a fault-tolerant middleware system, we present empirical evidence that the unpredictability inherent in such systems arises from merely 1% of the remote invocations. The occurrence of very high latencies cannot be regulated through parameters such as the number of clients, the replication style and degree or the request rates. However, by selectively filtering out a “magical 1%” of the raw observations of various metrics, we show that performance, in terms of measured end-to-end latency and throughput, can be bounded, easy to understand and control. This simple statistical technique enables us to guarantee, with some level of confidence, bounds for percentile-based quality of service (QoS) metrics, which dramatically increase our ability to tune and control a middleware system in a predictable manner.