Looking to Parallel Algorithms for ILP and Decentralization

TitleLooking to Parallel Algorithms for ILP and Decentralization
Publication TypeReports
Year of Publication1998
AuthorsBerkovich E, Jacob B, Nuzman J, Vishkin U
Date Published1998/10/15/
InstitutionDepartment of Computer Science, University of Maryland, College Park
KeywordsTechnical Report

We introduce explicit multi-threading (XMT), a decentralized architecturethat exploits fine-grained SPMD-style programming; a SPMD program can
translate directly to MIPS assembly language using three additional
instruction primitives. The motivation for XMT is: (i) to define an
inherently decentralizable architecture, taking into account that the
performance of future integrated circuits will be dominated by wire costs,
(ii) to increase available instruction-level parallelism (ILP) by
leveraging expertise in the world of parallel algorithms, and (iii) to
reduce hardware complexity by alleviating the need to detect ILP at
run-time: if parallel algorithms can give us an overabundance of work to
do in the form of thread-level parallelism, one can extract
instruction-level parallelism with greatly simplified dependence-checking.
We show that implementations of such an architecture tend towards
decentralization and that, when global communication is necessary, overall
performance is relatively insensitive to large on-chip delays. We compare
the performance of the design to more traditional parallel architectures
and to a high-performance superscalar implementation, but the intent is
merely to illustrate the performance behavior of the organization and to
stimulate debate on the viability of introducing SPMD to the single-chip
processor domain. We cannot offer at this stage hard comparisons with
well-researched models of execution.
When programming for the SPMD model, the total number of operations that
the processor has to perform is often slightly higher. To counter this, we
have observed that the length of the critical path through the dynamic
execution graph is smaller than in the serial domain, and the amount of
ILP is correspondingly larger. Fine-grained SPMD programming connects with
a broad knowledge base in parallel algorithms and scales down to provide
good performance relative to high-performance superscalar designs even
with small input sizes and small numbers of functional units.
Keywords: Fine-grained SPMD, parallel algorithms. spawn-join, prefix-sum,
instruction-level parallelism, decentralized architecture.
(Also cross-referenced as UMIACS-TR- 98-40)