Tudor Dumitraș

Assistant Professor
ECE Department
University of Maryland, College Park

Fault-Tolerant Communication in Networks-on-Chip

The network-on-chip (NoC) architecture proposes to connect multiple heterogeneous cores using an on-chip network instead of a shared bus, and requires network protocols with end-to-end reliability guarantees. The design of NoC protocols must revisit the core assumptions of large-scale networking: because high bandwidth is available and computational resources are scarce, NoC communication can utilize excess network capacity rather than implement sophisticated fault-tolerance schemes [ASP-DAC 2003]. We introduced the first pragmatic approach for fault-tolerant communication in NoC, stochastic communication, based on randomized gossip protocols. Stochastic communication provides sustainable throughput and gracefully degrading latency with up to 70% of network packets corrupted by soft errors [DATE 2003][VLSI Design 2007]. Stochastic communication advocated a fundamental paradigm shift from traditional chip-design approaches, which guarantee the correctness of devices and interconnects, by tolerating network-on-chip faults at the system level.

References

  1. [VLSI Design 2007] P. Bogdan, T. Dumitraș, and R. Mărculescu, “Stochastic Communication: A New Paradigm for Fault-Tolerant Networks-on-Chip,” VLSI Design, vol. special issue on Networks-on-Chip, 2007.
    PDF

  2. [DATE 2003] T. Dumitraș and R. Mărculescu, “On-Chip Stochastic Communication,” in Design, Automation and Test in Europe (DATE), Munich, Germany, 2003.
    PDF

  3. [ASP-DAC 2003] T. Dumitraș, S. Kerner, and R. Mărculescu, “Towards On-Chip Fault-Tolerant Communication,” in Asia and South Pacific Design Automation Conference (ASP-DAC), Kitakyushu, Japan, 2003, pp. 225–232.
    PDF

Comments