Layout-Accurate Design and Implementation of a High-Throughput Interconnection Network for Single-Chip Parallel Processing

TitleLayout-Accurate Design and Implementation of a High-Throughput Interconnection Network for Single-Chip Parallel Processing
Publication TypeConference Papers
Year of Publication2007
AuthorsBalkan AO, Horak MN, Qu G, Vishkin U
Conference NameHigh-Performance Interconnects, 2007. HOTI 2007. 15th Annual IEEE Symposium on
Date Published2007/08//
Keywordsdescription, design;mesh, interconnection, languages;multi-threading;multiprocessor, MoT, multi-threading;layout-accurate, network;on-chip, network;Verilog, networks;parallel, of, on-chip, Parallel, processing;, processing;hardware, processor;parallel, processors;on-chip, programming;pipeline, registers;single-chip, simulations;eXplicit, TREES, XMT

A mesh of trees (MoT) on-chip interconnection network has been proposed recently to provide high throughput between memory units and processors for single-chip parallel processing (Balkan et al., 2006). In this paper, we report our findings in bringing this concept to silicon. Specifically, we conduct cycle-accurate Verilog simulations to verify the analytical results claimed in (Balkan et al., 2006). We synthesize and obtain the layout of the MoT interconnection networks of various sizes. To further improve throughput, we investigate different arbitration primitives to handle load and store, the two most common memory operations. We also study the use of pipeline registers in large networks when there are long wires. Simulation based on full network layout demonstrates that significant throughput improvement can be achieved over the original proposed MoT interconnection network. The importance of this work lies in its validation of performance features of the MoT interconnection network, as they were previously shown to be competitive with traditional network solutions. The MoT network is currently used in an eXplicit multi-threading (XMT) on-chip parallel processor, which is engineered to support parallel programming. In that context, a 32-terminal MoT network could support up to 512 on-chip XMT processors. Our 8-terminal network that could serve 8 processor clusters (or 128 total processors), was also accepted recently for fabrication.