Top | Programmer's Model | On-Line Tutorial | Software Release | Teaching Parallelism & Full On-Line Course | Talks | Publications | Team

Explicit Multi-Threading (XMT): A PRAM-On-Chip Vision

A Desktop Supercomputer


In the past, mainstream computers used a single processor core. Nearly a decade ago all mainstream commercial computers switched to using multiple cores (due to technical reasons, such as overheating), and became known as multi-cores. Consequently, the XMT vision for a desktop many-core supercomputer, introduced in 1997, became relevant for mainstream computing. XMT is as timely in 2013: policy reports, such as The Future of Computing Performance: Game Over or Next Level?, recognize that the field is desperate for alternative multi-core ideas, as well as competition. To appreciate how serious the situation is, note the extremely low level of application innovation in general-purpose desktop computing (excluding graphics) during the last decade (e.g., in comparison to the internet and mobile markets). The main reasons for the current situation are: 1. Commercial multi-core machines have, so far, failed to generate a broad base among application programmers; most programmers simply find these multi-cores too difficult to program effectively; indeed, the Game Over report points out that Only heroic programmers can exploit the vast parallelism in today's machines . 2. In turn, the repelling of programmers took the steam out of desktop computing: not having enough ready-to-deploy applications for multi-cores weakened the business case for larger investment in better multi-core hardware. It is also relevant to point out that by the time competition among machine vendors was needed to give the market true choice towards the transition to multi-cores, most brand names of the 1990s (e.g., Apple, DEC, IBM, Motorola, Sun) have already dropped out from the desktop hardware market.

XMT aspires to break this impasse. Rather than sidestep ease-of-programming, or treat it as afterthought, XMT builds on PRAM -- the foremost parallel algorithmic theory to date. XMT appears to be the only active architecture research project in the computer science community today that handles ease-of-programming as a first class constraint.

2012 status report. Using our hardware and software prototypes of XMT, we recently demonstrated parallel speedups that beat all current platforms by orders of magnitude on the most advanced parallel algorithms in the literature, including for max-flow, graph biconnectivity and graph triconnectivity. This stress test validates our 30-year old hypothesis that the PRAM algorithmic theory can save parallel computing from its programming woes, while traversing an architecture narrative that has belittled for decades the PRAM theory, stipulating that it is too simplistic for any existing or future parallel machine. The XMT/PRAM advantage is especially important for the so-called, irregular, fine-grained programs that all commercial platforms to date fail to support effectively.

Few quick links


* For an updated overview see: U. Vishkin. Using simple abstraction to reinvent computing for parallelism. Communications of the ACM (CACM) 54,1, pages 75-85, January, 2011. Anecdote: The ACM recorded 6,687 downloads of this paper in the first year following its publication.
* The most recent XMT presentation, entitled Using Simple Abstraction to Guide the Reinvention of Computing for Parallelism, was delivered as a Parallel@Illinois Distinguished Lecture and a Computer Science Colloquium, University of Illinois, Urbana-Champaign, March, 2010, as well as at the Hebrew University, May 2010, University of Vienna, June 2010 and Columbia University, September 2010. Click for: abstract and video, and from the Parallel@Illinois Archives, or for slides, from the Columbia talk.
* Parallel Programming Talk on Intel Software Network TV, Episode 80, June 8, 2010. Click to download.
* Interview, recorded at SIGCSE 2010 and featured at Teach Parallel on Intel Software Network TV. Click to download.
* Click for a paper representing the 2007/8 generation of XMT (pdf, 12 pages).
* Click for a June 2006 introduction to XMT.
* Click for an older version of this page .


Navigate this page:


Top | Programmer's Model | On-Line Tutorial | Software Release | Teaching Parallelism & Full On-Line Course | Talks | Publications | Team

The XMT programmer's model, in a nutshell

The XMT programmer's model is unique as it incorporates an algorithmic model, the parallel random-access machine/model (PRAM), into the programming "workflow". The wealth of the PRAM theory of algorithms is well documented. The XMT project has been driven by a PRAM-On-Chip vision, seeking to build an easy-to-program parallel computer comprising thousands of processors on a single chip, using PRAM-like programming. Interestingly, starting with a PRAM might not have been an obvious choice. Technology constraints guide us away from tightly coupled concurrency in programs; e.g., away from the PRAM and towards multi threading. On the other hand, relaxed concurrency in programs is notoriously difficult to design or analyze for correctness or performance. The XMT programming approach incorporates an elegant workaround. XMT provides a work flow from a PRAM algorithm to an XMT program, and even to fine-tuning an XMT program for performance. Given a problem, a PRAM style parallel algorithm is developed for it using the Shiloach-Vishkin 1982 Work-Depth (WD) Methodology. All the operations that can be concurrently performed in the first round are noted, followed by those that can be performed in the second round, and so on. See the figure below that contrasts this "natural" (parallel) algorithm with the serial, one operation at a time, doctrine. Such synchronous description of a parallel algorithm makes it easy to reason about correctness and analyze for work (the total number of operations) and depth (number of rounds). The XMT programmer is then expected to use the XMTC language (basically C with two additional commands) for producing a multi-threaded program. Reasoning about correctness or performance can now be restricted to just comparison of the program with the WD algorithm, assuming that correctness and performance of the algorithm have been established, often a much easier task than directly analyzing the program. This workaround allowed college freshmen and even high-school students solve the same problems they get in typical freshmen serial programming course assignments using (XMT) parallel programming.

The programming side is quite simple. See the figure below that depicts serial and parallel modes and spawn and join commands. Its primary example is a standard C programming language variant called XMTC that adds only two basic commands. The main added command is spawn. The program alternates between serial mode and parallel mode. The spawn command can declare any number of concurrent threads and causes a switch from serial mode to parallel mode. Each thread advances through its program at its own speed till termination. When all threads terminate (depicted as Join in the figure), the program switches back to serial mode. Successive spawn commands can each declare a different number of threads. This brief description is not meant to replace a fuller description that can be found in technical presentations and papers below. For example, this brief description suppresses: (i) the second added command, (ii) the classification of XMTC as a so-called single-program multiple-data (SPMD) language, (iii) the so-called independence-of-order (IOS) semantics of XMTC, and, of course, (iv) how the program is compiled and implemented in hardware.

Why is the problem that XMT solves of broad interest?

General-purpose computing (your personal computer) is one of the biggest success stories of technology as a business model. Intel's former CEO, Andy Grove, coined the term Software Spiral to describe the following process: improvements in hardware lead to improvements in software that lead back to improvements in hardware. Dr. Grove noted that the software spiral has been a powerful engine of sustained growth for general-purpose computing for quite a few decades. The bad news is that this spiral is now broken. Until recently, programmers could assume that computers execute one operation at time. To programmers, the improvements in hardware simply looked as executing single operations faster and faster. Software developers (programmers) could continue to write programs in the same way, knowing that the next generation of hardware will run them faster. Technology constraints are now forcing the industry to base all its new hardware on parallel computing, which executes many concurrent operations at a time. This hardware requires a complete overhaul of the way that software needs to be written. Unfortunately, as hardware vendors recognize, this hardware can be too demanding on software developers. For example, Chuck Moore, Senior Fellow, AMD said in 2008: "To make effective use of multicore hardware today, you need a PhD in computer science". Evidence that not much has changed is reflected in the following July 2010 title by Prof. David Patterson, University of California, Berkeley: The Trouble with Multicore: Chipmakers are busy designing microprocessors that most programmers can't handle.

Keeping the general-purpose "Software Spiral" on track, which requires reinventing both software and hardware platforms for parallel computing, is one of the biggest challenges of our times. Parallel software productivity problems are breaking the spiral, and failing to resolve the problem can cause a significant recession in a key component of the economy.

What specific problem does XMT solve? And what is unique about the XMT solution?

XMT provides a complete solution for the way future computer should be built (hardware) and programmed (software). Unlike other contemporary computers, XMT is based on the only approach that was endorsed by the theory of computer science for addressing the intellectual effort of developing parallel computer programs (or parallel algorithms). The advanced computers that the industry currently builds are very difficult to program, while even high school student can program XMT. In fact, XMT programming has been successfully taught to high school students. To appreciate the significance, simply contrast this fact with the above quote by Mr. Moore, AMD or Prof. Patterson's title.

Explaining parallel computing at Middle School:

What Grown-Ups Do Not Understand About Parallel Programming - The Peanut Butter and Jelly Story.

High School Teacher on His Experience with XMT

For a 4-part interview with Shane Torbert a Computer Science and Mathematics teacher at Thomas Jefferson High School for Science and Technology, Alexandria, Virginia, click Part 1: Why XMT (m4v player, 4 minutes) , Part 2: Ease of Use (m4v player, 3 minutes) , Part 3: Content Focus (m4v player, 3 minutes) , and Part 4: Final word to teachers (m4v player, 3 minutes) . Mr. Torbert studied by himself (PRAM algorithm and) XMT programming from material available through this webpage. The interview was presented in a Keynote at the CS4HS Summer Workshop at Carnegie-Mellon University that can be found under Talks below. For a paper on his experience, see the Parallel Computing session at SIGCSE, Milwaukee, WI, March 2010.

High School Student on His Experience with XMT
- I was motivated to solve all the XMT programming assignments we got, since I had to cope with solving the algorithmic problems themselves which I enjoy doing. In contrast, I did not see the point of programming other parallel systems available to us at school, since too much of the programming was effort getting around the way the systems were engineered, and this was not fun.
Jacob Hurwitz, 10th grader, Montgomery Blair High School Magnet Program, Silver Spring, Maryland, December 2007. Jacob was one of the students who completed all programming assignments, in the Informal Parallel Programming Course for High School Students, offered in Fall 2007. These challenging high school assignments included all programming assignments in the UMD graduate course on parallel algorithms.

Some Quotes from Experts

- The single-chip supercomputer prototype built by Prof. Uzi Vishkin's group uses rich algorithmic theory to address the practical problem of building an easy-to-program multicore computer. Vishkin's XMT chip reincarnates the PRAM algorithmic technology developed over the past 25 years, uniting the theory of yesterday with the reality of today.
Charles E. Leiserson, Professor of Computer Science, MIT, June 2007.

- I am happy to hear of the impending announcement of a 64 processor prototype of Uzi Vishkin's XMT architecture. This system represents a significant improvement in generality and flexibility for parallel computer systems because of its unique ability to exploit fine-grain concurrency. It will be able to exploit a wider spectrum of parallel algorithms than today's microprocessors can, and this in turn will help bring general purpose parallel computing closer to reality.
Burton Smith, Technical Fellow, Microsoft Corporation, June 2007.

- Today's multi-core processors support coarse grain parallelism. Professor Vishkin has defined a new parallel architecture that supports extremely fine-grained threading. On XMT, a program can be profitably decomposed into tasks as small as 10 instructions. With a complete programming model and an impressive FPGA-based prototype, Professor Vishkin is proposing a compelling alternative design for future microprocessors.
Geoff Lowney, Intel Fellow, April 2008.

Credit should go to the whole XMT team


Top | Programmer's Model | On-Line Tutorial | Software Release | Teaching Parallelism & Full On-Line Course | Talks | Publications | Team

On-Line Tutorial on Parallel Programming through Parallel Algorithms

The following 8 items are available for download following a full day tutorial to high-school students at the University of Maryland on September 15, 2007. The 9th item is a September 2008 software release of of the XMT environment. The 10th item links to the selection of material in a recent high-school course on the topic. By Summer 2009, 100 K-12 students had already studied XMTC programming.
(1)
Announcement of the tutorial.
(2) Short description of the tutorial, same as for the June 17 Seattle tutorial, noted below.
(3) 72 slides of the September 15, 2007 tutorial.
(4) Video recording of the September 15, 2007 high-school tutorial in 4 parts. The morning session is in Part 1A (100 minutes), and Part 1B (95 minutes). Then the students had a 30 minutes hands-on session on the XMT desktop supercomputer (1st picture and 2nd picture), followed by heading back to the lecture room (3rd picture), Part 2A (109 minutes), and Part 2B (10 minutes).
In addition, the following 3 documents provide a basis for parallel algorithms, as well as for parallel programming in XMTC:
(5) Classnotes of the course on parallel algorithms: U. Vishkin, Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques, 104 pages. These class notes are meant to make the fundamental theory of parallel algorithms accessible to computer science and engineering graduate students. They have also been tested successfully for upper division undergraduate students.
(6) XMTC Manual and
(7) XMTC Tutorial
(8) Programming assignments . Please look up under assignments on Home page of Informal Parallel Programming Course for High School Students, Fall 2007. For XMTC code examples download this paper (pdf, 62 pages). Finally, instructors can request more code examples by e-mail.
(9) Software release of: (i) a cycle-accurate simulator of the XMT machine, (ii) a compiler from XMTC to that machine, and (iii) several other software tools. This public software release allows you to use your own computer for programming on an XMT environment and experimenting with it.
(10) Web site of a course on parallel algorithms using the XMT platform at the Thomas Jefferson High School for Science and Technology, Alexandria, VA, December 2008 through January 2009.
(11) Finally, for an on-line version of a semester-long course including video recording of lectures, see the parallel algorithms course as taught in 2009 to graduate students at UMD. See also suggestions for what to teach and how, and the title below: Teaching Parallelism and Full On-Line Course

The September 15, 2007 tutorial kicked off an informal parallel programming course for high school students. Beyond the opening tutorial, the course was conducted through a weekly office hour that Scott Watson, an undergraduate teaching assistant, held at Montgomery Blair High School. Item 8 above, the home page for the informal course, provides practical guidance on using the XMT FPGA computer and links to 8 programming assigments.
Past tutorial: Tutorial on "How to Think Algorithmically in Parallel", Seattle, WA, Sunday, June 17, 2007, held in conjunction with the 21st ACM International Conference on Supercomputing (ICS).



Top | Programmer's Model | On-Line Tutorial | Software Release | Teaching Parallelism & Full On-Line Course | Talks | Publications | Team

Software Release of the XMT Environment

This public software release allows you to use your own computer for programming on an XMT environment and experimenting with it, as pointed out in Item (9) above. Download the software release.



Top | Programmer's Model | On-Line Tutorial | Software Release | Teaching Parallelism & Full On-Line Course | Talks | Publications | Team

Teaching Parallelism and Full On-Line Course

We provide all the material needed for a course on parallel algorithms coupled with XMTC programming assignments: video-recorded lectures for the full semester, class notes, dry homework assignments, and programming assignments along with a complete software environment:see the parallel algorithms course as taught in 2009 to graduate students at UMD. We also explain the need for such a course. Download teaching suggestions.



Top | Programmer's Model | On-Line Tutorial | Software Release | Teaching Parallelism & Full On-Line Course | Talks | Publications | Team

Talks (for talks presenting a particular publication, look under Publications)

·  (General talk:) Explicit Multi-Threading (XMT): A Theorist's Proposal for a Big Little Computer System. Download talk (27 slides). Versions of the SPAA'98 talk were given at IBM Haifa Research Laboratory (8/98), University of Maryland Computer Engineering Colloquium (9/98), The Computer Architecture and Parallel System Laboratory, University of Delaware (10/98), The Center for Research on Parallel Computation, Rice University (10/98), CASES98: Workshop on Compiler and Architecture Support for Embedded Systems, Washington, DC (12/98), New York University (12/98), IBM T.J. Watson (12/98), the Workshop on Parallel Algorithms (WOPA'99), Atlanta (5/99), Technion - Israel Institute of Technology (5/99), and Georgia Institute of Technology (3/00).

·  (Specialized talk:) Experiments with list ranking for Explicit Multi-Threaded (XMT) instruction parallelism (by S. Dascal and U. Vishkin). 3rd Workshop on Algorithms Engineering (WAE'99), London, U.K., July 1999.
Download talk (13 slides).

·  (General talk) What to do with all this hardware? Could the PRAM-On-Chip architecture lead to upgrading the WINTEL performance-to-productivity platform? Download abstract. Download talk (postscript, slides in 6 quads) or (pdf, 6 quads) Invited talks were given at the Intel Microprocessor Research Lab (MRL), Santa Clara, CA, and at Microsoft Research, Redmond, WA, in January 2002, and at IBM T.J. Watson Research Center, Yorktown Heights, NY, in June 2002.
Link to the Multi-University Research Lab Seminar Series for a recording of the Microsoft Research talk (based on the 6 quads of slides above).

·  (Specialized talk) Bending Light for Multi-Chip Virtual PRAMs? (12 slides), 3rd Workshop on Non-Silicon Computation, held in conjunction with the 31st International Symposium on Computer Architecture (ISCA 2004), June 19-23, 2004, Munich, Germany.

·  (Specialized talk) Pilot development time productivity study on the Spring 2004 course Parallel Algorithmics (12 slides), DARPA's High Productivity Computing Systems Program and Productivity Summit, Fairfax, VA June 29-30, 2004. For context, see http://www.highproductivity.org/

·  (General talk) PRAM-On-Chip: A Quest for Not-So-Obvious Non-Obviousness. Download abstract . Download talk (pdf, 33 slides) . Invited talks were given at Intel Research Lab, Petach Tiqwa (Israel), 8/2004, and at MFCS'04, Prague, 8/2004.

·  (General talk) Can Parallel Computing Finally Impact Mainstream Computing? Download abstract . Download talk (pdf, 40 slides) . Invited talks were given at Microsoft Research and Intel Research Pittsburgh, 3/2005 and 4/2005.

·  (General talk) PRAM on Chip: Advancing from Parallel Algorithms to a Parallel System. Download talk (pdf, 46 slides) . Talks were given at Sandia National Lab 3/2006 and the UMIACS Workshop on Parallelism in Algorithms and Architectures, May 12, 2006.

·  (General talk) PRAM-On-Chip Vision: Explicit Multi-Threading (XMT) Technology. Download talk (pdf, 53 slides) . Distinguished EECS Colloquium, University of Central Florida, March 2007.

·  (Panel discussion position) For General-Purpose Parallel Computing: It is PRAM or never (pdf, 17 slides) . Microsoft Workshop on Manycore Computing, Seattle, Washington, June 20-21, 2007 .

·  (General talk) Towards Realizing a PRAM-On-Chip Vision. Download talk (pdf, 60 slides) . Keynote, Workshop on Highly Parallel Processing On Chip (HPPC) , August 28, 2007, in conjuction with Euro-Par 2007, Rennes, France.

·  (General talk) The eXplicit MultiThreading (XMT) Parallel Computer Architecture - Next generation mutli-core supercomputing. Download talk (pdf, 64 slides) . Talk given to Intel, December 13, 2007 and elsewhere.

·  (Panel discussion presentation at IPDPS08, April 16, 2008, Miami, Florida) Topic of panel: How to avoid making the same mistakes all over again or... how to make the experiences of the parallel processing communities useful for the multi/many-core generation. Download presentation (pdf, 6X2 slides) . Program of IPDPS08 .

·  Disruptive Long-Term (~2018) R&D Goal: Easy-to-Program Many-Processor Supercomputer (ppt, 18 slides) . Department of Energy (DOE) Institute for Advanced Architecture and Algorithms(IAA) Interconnection Networks Workshop, July 21-22, 2008, San Jose, California.

·  (General talk) The PRAM-On-Chip Proof-of-Concept. Download talk (pdf, 42 slides) . Workshop on Theory and Many Cores (T&MC), University of Maryland, May 29, 2009. See also: Introduction to the Workshop. Download talk (pdf, 8 slides) .

·  Parallel Programming for High Schools. Download talk (ppt, 48 slides) . Keynote (joint with Mathematics Education Professor Ron Tzur, Purdue University, Mathematics Education graduate student David Ellison, University of Maryland and University of Indiana, and George Caragea, graduate student in the XMT team) at CS4HS Summer Workshop, Carnegie-Mellon University, July 24-27, 2009. The talk incdlues a 4-part interview with Shane Torbert a Computer Science and Mathematics teacher at Thomas Jefferson High School for Science and Technology, Alexandria, Virginia. For the interview click Part 1: Why XMT (m4v player, 4 minutes) , Part 2: Ease of Use (m4v player, 3 minutes) , Part 3: Content Focus (m4v player, 3 minutes) , and Part 4: Final word to teachers (m4v player, 3 minutes) . Mr. Torbert studied by himself (PRAM algorithm and) XMT programming from material available through this webpage. Note also pictures from a Montgomery County (Maryland) Public Schools middle-school summer camp (for rising 6-8th graders who are 10-13 year old) from underrepresented groups, July 13-24, 2009.

·  (Panel discussion presentation at the Workshop on Highly Parallel processing on Chip (HPPC), August 25, 2009, as part of Europar 2009, Delft, The Netherlands) Topic of panel: Are Many-Core Computer Vendors on Track? Download presentation (ppt, 12 slides) . Panel summary .

·  (General talk) No Need to Constrain Many-Core Parallel Programming. Download abstract.
Computer science version: Download talk (ppt, 42 slides), talk given at the Parallel Computing Day, Ben Gurion University, Beer Sheva, Israel, October 20, 2009.
Engineering version: Download talk (pptx, 47 slides), talk given at ICCD09, EE@Technion and Intel Haifa, October 2009.

·  Position: Reinvention of computing for many-core parallelism requires addressing programmer's productivity. Download extended presentation (ppt, 18 slides) . Presentation at the Indo-US Workshop on Parallelism and the Future of High-Performance Computing, January 9-10, 2010, Bangalore, India .

·  (General talk) Using Simple Abstraction to Guide the Reinvention of Computing for Parallelism. Parallel@Illinois Distinguished Lecture and a Computer Science Colloquium, University of Illinois, Urbana-Champaign, March 15, 2010. Abstract, video, and slides from the Parallel@Illinois Archives. Power point slides are available here (pptx, 52 slides). Talks with the same title were given at the Hebrew University, Jerusalem, Israel, May 30, 2010 and University of Vienna, Austria, June 2, 2010. Power point slides are available here (pptx, 52 slides).

·  (General talk) From asymptotic PRAM speedups to easy-to-obtain concrete XMT ones, invited talk, DIMACS Workshop on Parallelism: A 20-20 Vision, March 14-16, 2011. Download abstract. Download slides (pdf, 24 slides).

·  G. Caragea, J. Edwards and U. Vishkin. Can PRAM Graph Algorithms Provide Practical Speedups on Many-Core Machines? contributed talk, DIMACS Workshop on Parallelism: A 20-20 Vision , March 14-16, 2011. Get the slides for the talk by James Edwards from DIMACS webpage.

·  U. Vishkin. Panel: looking back at 25 years of IPDPS, Anchorage, Alaska, May 13, 2011. Download slides.

·  (General talk) Current mainstream computing: If in a hole, should we keep digging? Abstract, presented on June 2, 2011 at the Computer Science Department, University of California, San Diego. 61 slides, or link through the UCSD class on Parallel Algorithms Later talks expending on this title were given at ETH Zurich (11/2011), Tel Aviv University (11/2011), Georgetown University (12/2011) and Los Alamos National Lab (1/2012).

·  U. Vishkin. Panel: Teaching Parallelism, ACM-SPAA, San Jose, California, June 4, 2011. Download slides.

·  (General talk) Can the Floodgate of General-Purpose Computing Innovation Reopen with Many-Core Parallelism? Computer Science Colloquium, Purdue University, February 23, 2012.

·  (General talk) Time to Right the Transition to General-Purpose Many-Core Parallelism. Illinois-Intel Parallelism Center (I2PC) Distinguished Speaker Series, April 12, 2012. Video.

·  (General talk) NIH/NCBI Computational Biology Branch Seminar, April 27, 2012 and NIH/NCI, June 8, 2012.

·  (Position on research direction) Principles Matter and Can Matter More: Big Lead of PRAM Algorithms on Prototype-HW. Download Uzi Vishkin's abstract and slides. NSF Workshop on Research Directions in the Principles of Parallel Computation, Pittsburgh, Pennsylvania, USA June 28, 2012.

·  (General talk) IBM T.J. Watson, June 3, 2013.

·  (General talk) Is General-Purpose Many-Core Parallelism Imminent?. Distinguished Lecture Series, School of Computing Science, Simon Fraser University, June 18, 2013.

·  (General talk) Computer Science & Engineering Colloquia Series. Washington University, St. Louis, November 1, 2013.

·  J.A. Edwards and U. Vishkin.High-Bandwidth Communication Avoidance: Oxymoron Or Recipe? Talk to be given by James Edwards. Minisymposium on Minimizing Communication in Linear Algebra, 16th SIAM Conference on Parallel Processing for Scientific Computing, Portland, Oregon, February 18-21, 2014.


Top | Programmer's Model | On-Line Tutorial | Software Release | Teaching Parallelism & Full On-Line Course | Talks | Publications | Team

Publications

·  U. Vishkin, S. Dascal, E. Berkovich and J. Nuzman. Explicit Multi-Threading (XMT) Bridging Models for Instruction Parallelism (Extended Abstract). In Proc. 10th ACM Symposium on Parallel Algorithms and Architectures (SPAA), 1998.
Note: This paper introduces XMT to readers whose background includes reasonable understanding of algorithms and theory. The presentation is by way of a bridging model. Among other things, a bridging model provides a design space for algorithm designers and programmers, as well as a design space for computer architects. It is convenient to describe our wider vision regarding ``parallel-computing-on-a-chip'' as a two-stage development and therefore two bridging models are presented: Spawn-based multi-threading (Spawn-MT) and Elastic multi-threading (EMT).
Download TR (12 pages).

·  U. Vishkin, S. Dascal, E. Berkovich and J. Nuzman. Explicit Multi-Threading (XMT) Bridging Models for Instruction Parallelism (Extended Summary and Working Document). Current version of UMIACS TR-98-05 . First version: January 1998.
Note: This long paper is really the MAIN publication for the XMT framework, as first conceived. The SPAA'98 paper gives a principled presentation of XMT. It also introduces XMT to readers with algorithms and theory background. The next central presentation for the project is the journal version of the SPAA'01 paper. In contrast, the September 1999 TR UMIACS99-55 by Berkovich, Nuzman, Franklin, Jacob and Vishkin is not a good single representative of the XMT project, as a whole; while it describes possible architecture choices for the XMT framework, and studies their microarchitecture performance implications, some part of it are misleading when it comes to understanding choices actually made for XMT. The paper emphasizes an important ``decentralization'' property of the architecture. Its performance hardly degrades even when the number of cycles for traversing wires (interconnects) increases. The introduction that the latter paper provides to XMT is less comprehensive and less direct than the SPAA98 paper, as some aspects of XMT are not addressed, and some architecture choices were made for the sake of completeness of the architecture and not because they are required by the XMT paradigm.
Download TR (47 pages).

·  S. Dascal and U. Vishkin. Experiments with list ranking for Explicit Multi-Threaded (XMT) instruction parallelism (extended abstract). In Proc. 3rd Workshop on Algorithms Engineering (WAE'99), London, U.K., July 1999.
Download extended abstract (17 pages).
Download talk (13 slides).
An extended summary of this work is available as UMIACS-TR-99-33. Download extended summary (27 pages).
Invited to the Special Issue for WAE99: Journal version, from the ACM Journal of Experimental Algorithmics, Volume 5, 2000.

·  E. Berkovich, J. Nuzman, M. Franklin, B. Jacob, U. Vishkin. XMT-M: A scalable decentralized processor. UMIACS TR 99-55, September 1999.
Download TR (postscript, 18 pages). Note: as the above comment following UMIACS TR-98-05 states, UMIACS TR 99-55 could be misleading in understanding architecture choices made for XMT and is not a good single representative publication of the XMT project.

·  U. Vishkin. A No-Busy-Wait balanced tree parallel algorithmic paradigm. In Proc. 12th ACM Symposium on Parallel Algorithms and Architectures (SPAA), 2000.
Download TR (postscript, 9 pages). pdf.

·  D. Naishlos, J. Nuzman, C-W. Tseng, and U. Vishkin. Evaluating multi-threading in the prototype XMT environment. In Proc. 4th Workshop on Multi-Threaded Execution, Architecture and Compilation (MTEAC2000), December 2000 (held in conjunction with the 33rd Int. Symp. on Microarchitecture MICRO-33). Best Paper Award for MTEAC2000.
Download TR (pdf, 9 pages).

·  D. Naishlos, J. Nuzman, C-W. Tseng, and U. Vishkin. Evaluating the XMT Parallel Programming Model. In Proc. of the 6th Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS-6), April 2001.
Download TR (pdf, 6 pages).

·  U. Vishkin. Two techniques for reconciling algorithm parallelism with memory constraints (extended abstract). In Proc. 14th ACM Symposium on Parallel Algorithms and Architectures (SPAA), 2002.
Download TR (postscript, 4 pages). Download talk (postscript, 13 slides).

·  D. Naishlos, J. Nuzman, C-W. Tseng, and U. Vishkin. Towards a First Vertical Prototyping of an Extremely Fine-Grained Parallel Programming Approach. The conference version appeared in Proc. 13th ACM Symposium on Parallel Algorithms and Architectures (SPAA-01)), July 2001.
Download TR (pdf, 10 pages).
The journal version appeared in the invited Special Issue for SPAA01: TOCS 36,5 pages 521-552, Springer-Verlag, 2003.
Download the TR that led to the journal version (pdf, 26 pages).

·  A. Balkan, G. Qu, and U. Vishkin. Arbitrate-and-Move primitives for high throughput on-chip interconnection networks. In Proc. IEEE International Symposium on Circuits and Systems (ISCAS), SoC Design Technology lecture session , Vancouver, May 23-26, 2004.
Download TR (pdf, 4 pages).

·  U. Vishkin. Can Optical Interconnection Networks Lead to Cheaper High-Performance Multiprocessors? Proc. Optics East, S.P.I.E. - The International Society for Optical Engineering, Active and Passive Optical Components for WDM Communications IV, Volume 5595, Philadelphia, October 26-28, 2004, 162-166. Download TR (pdf, 5 pages).
The computer science and engineering side of this work was presented in the paper:

·  U. Vishkin and I. Smolyaninov. Bending Light for Multi-Chip Virtual PRAMs? Proc. 3rd Workshop on Non-Slicon Computation, held in conjunction with the 31st International Symposium on Computer Architecture (ISCA 2004), June 19-23, 2004, Munich, Germany, 32-38.
Download TR (pdf, 7 pages).

·  A. Tzannes, R. Barua, G.C. Caragea, and U. Vishkin. Issues in writing a parallel compiler starting from a serial compiler. Draft, 2006.

·  P. Gu and U. Vishkin. Case Study of Gate-level Logic Simulation on an Extremely Fine-Grained Chip Multiprocessor Journal of Embedded Computing, Special Issue on Embedded Single-Chip Multicore Architectures and related research - from System Design to Application Support, 2, 2 (2006), 181-190.
The test environment including the XMT machine, parallel logic simulation program and benchmark circuits .
Download paper (pdf, 10 pages).

·  A. Balkan, G. Qu and U. Vishkin. Mesh-of-Trees and Alternative Interconnection Networks for Single-Chip Parallel Processing. In Proc. 17th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2006), Steamboat Springs, Colorado, September 11-13, 2006, 73-80. Best Paper Award.

·  U. Vishkin, I. Smolyaninov and C. Davis. Plasmonics and the parallel programming problem. Silicon Photonics Conference, SPIE Symposium on Integrated Optoelectronic Devices 2007, January 20-25, 2007, San Jose, CA.
Download TR (pdf, 10 pages).

·  X. Wen and U. Vishkin. PRAM-On-Chip: First Commitment to Silicon. Brief announcement. Proc. 19th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pp. 301-302, June 9-11, 2007.
Download TR (pdf, 2 pages). Download talk (pdf, 17 slides).

·  A. Balkan, M. Horak, G. Qu, and U. Vishkin. Layout-Accurate Design and Implementation of a High-Throughput Interconnection Network for Single-Chip Parallel Processing. Hot Interconnects 2007, Stanford University, CA, August 22-24, 2007. Download TR (pdf, 8 pages). Download talk (pdf, 21 slides).

·  U. Vishkin, G. Caragea and B. Lee. Models for Advancing PRAM and Other Algorithms into Parallel Programs for a PRAM-On-Chip Platform. Chapter 5, Handbook on Parallel Computing: Models, Algorithms, and Applications, Editors: S. Rajasekaran and J. Reif, Chapman and Hall/CRC Press, 2008. This paper provides models for developing performance-tuned XMT programs. There is no conflict between this work and the quest for ease-of-programming using the PRAM model. The implied methodology (for the XMT system developers) is as follows. Given a simple PRAM-like program manually performance-tune it using the performance models in this paper. Then teach the compiler to produce the performance-tuned program. ·  Download TR (pdf, 62 pages).

·  L. Hochstein, V. Basili, U. Vishkin and J. Gilbert. A pilot study to compare programming effort for two parallel programming models. Journal of Systems and Software Volume 81 Issue 11, 1920-1930, November, 2008. This paper compares the programming effort in MPI and XMTC for a similar programming assignment. The main finding is that with high confidence XMTC development time is nearly half of MPI. Download TR (pdf, 26 pages).

·  X. Wen and U. Vishkin. FPGA-based prototype of a PRAM-on-chip processor, ACM Computing Frontiers, Ischia, Italy, May 5-7, 2008. Download TR (pdf, 12 pages). Download talk (pdf, 21 slides).

·  F. Keceli, T. Moreshet, and U. Vishkin. Power study of a 1000-core processor. 17th International Workshop of Logic and Synthesis, Lake Tahoe, Ca, June 4-6., 2008.

·  A. Balkan, G. Qu, and U. Vishkin. An Area-Efficient High-Throughput Hybrid Interconnection Network for Single-Chip Parallel Processing. 45th Design Automation Conference, Anaheim, CA, June 8-13, 2008. Download TR (pdf, 6 pages).

·  X. Wen and U. Vishkin. The XMT FPGA prototype/cycle-accurate-simulator hybrid. The 3rd Workshop on Architectural Research Prototyping, WARP08, June 21-22, 2008, Beijing, China, held in conjunction with ISCA 2008. Download extended abstract. Download slides of Xingzhi Wen's talk.

·  T. DuBois, B. Lee, Y. Wang, M. Olano and U. Vishkin. XMT-GPU: A PRAM architecture for graphics computation. Proc. 37th International Conference on Parallel Processing (ICPP08), Portland, Oregon, September 8-12, 2008. Download TR (pdf, 9 pages). Download talk (pdf, 25 slides).

·  A. Balkan, G. Qu, and U. Vishkin. Mesh-of-Trees and alternative interconnection networks for single-chip parallelism. IEEE Transactions on VLSI 17,10 (2009), 1419-1432.

·  G.G. Caragea, A.B. Saybasili, X. Wen and U. Vishkin. Performance potential of an easy-to-program PRAM-On-Chip prototype versus state-of-the-art processor. Proc. 21st ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pp. 163-165, Calgary, Canada, August 10-13, 2009. Download TR (pdf, 3 pages). Download talk (ppt, 14 slides).

·  U. Vishkin. Algorithmic approach to designing an easy-to-program system: can it lead to a HW-enhanced programmer's workflow add-on? Proc. International Conference on Computer Design (ICCD), Lake Tahoe, CA, October 4-7, 2009. Download TR (pdf, 4 pages). Download talk (pptx, 47 slides): this is a slightly updated version of the slides following later talks at the Electrical Engineering Department, Technion and Intel, Haifa, October 21 and 22, 2009. Download abstract for these talks.

·  A. B. Saybasili, A. Tzannes, B.R Brooks and U. Vishkin. Highly parallel multi-dimensional fast Fourier transform on fine- and course-grained many-core approaches. Proc. 21st Conference on Parallel and Distributed Computing Systems (PDCS), Cambridge, MA, November 2-4, 2009. Download TR.

·  A. Tzannes, G.C. Caragea, R. Barua and U. Vishkin. Lazy binary splitting: A run-time adaptive dynamic work-stealing scheduler. Proc. 15th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP), Bangalore, India, January 9-14, 2010. Download TR (pdf, 11 pages). Download Alex Tzannes's talk (ppt, 39 slides).

·  S. Torbert, U. Vishkin, R. Tzur and D. Ellison. Is teaching parallel algorithmic thinking to high-school student possible? One teacher's experience. Proc. 41st ACM Technical Symposium on Computer Science Education (SIGCSE), Milwaukee, WI, March 10-13, 2010. Download from the Parallel Computing session at SIGCSE. Download TR (pdf, 5 pages). Talk by Thomas Jefferson High School teacher Shane Torbert (pdf, 24 slides).

·  M.N. Horak, S.M. Nowick, M. Carlberg and U. Vishkin. A low-overhead asynchronous interconnection network for GALS chip multiprocessor. Proc. 4th ACM/IEEE International Symposium on Networks-on-Chip (NOCS2010), Grenoble, France, May 3-6, 2010. Download TR (pdf, 8 pages). Extended slides of talk by Prof. Steve Nowick, Columbia University (pdf, 84 slides). Journal version: IEEE Transactions on CAD 30,4 (April 2011), 494-507. Special Issue for NOCS 2010.

·  C.G. Caragea, F. Keceli, A. Tzannes and U. Vishkin. General-purpose vs. GPU: Comparison of many-cores on irregular workloads. Proc. HotPar 2010, University of California, Berkeley, June 14-15, 2010, Download TR (pdf, 7 pages).) Download poster introduction slide (pptx, 1 slide).) Download poster (pptx, 1 slide).)

·  C.G. Caragea, A. Tzannes, F. Keceli, R. Barua and U. Vishkin. Resource-aware compiler prefetching for many-cores. Proc. 9th Int. Symp. on Parallel and Distributed Computing (ISPDC), Istanbul, Turkey, July 7-9, 2010. Download TR (pdf, 8 pages).) Journal version: to appear in International Journal of Parallel Programming Download TR (pdf, 26 pages).)

·  U. Vishkin. Using simple abstraction to reinvent computing for parallelism. Communications of the ACM (CACM) 54,1, pages 75-85, January, 2011. Download from CACM. [Download TR (pdf, 9 pages); note: the published version should be accessible to many more readers, as it was edited by the CACM.]

·  U. Vishkin. Algorithms-based extension of serial computing education to parallelism. Workshop on Integrating Parallelism Throughout the Undergraduate Computing Curriculum, in Conjunction with the 16th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP 2011), San Antonio, Texas, February 12, 2011. Download abstract (pdf, 3 pages).)

·  D. Padua, U. Vishkin and J. Carver. Joint UIUC/UMD parallel algorithms/programming course. First NSF/TCPP Workshop on Parallel and Distributed Computing Education (EduPar-11), in conjunction with IPDPS, Anchorage, Alaska, May 16, 2011. Download abstract (pdf, 2 pages). Download slides.

·  F. Keceli, A. Tzannes, G. Caragea, R. Barua and U. Vishkin. Toolchain for Programming, Simulating and Studying the XMT Many-Core Architecture. Proc. 16th International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS 2011), in conjunction with IPDPS, Anchorage, Alaska, May 20, 2011. Download paper. Download slides of Fuat Keceli's talk.
Download supporting technical report: A. Tzannes, G.C. Caragea, U. Vishkin and R. Barua. The compiler for the XMTC parallel language: Lessons for compiler developers and in-depth description. UMIACS-TR 2011-01.
Download supporting technical report: F. Keceli and U. Vishkin. XMTSim: A Simulator of the XMT Many-core Architecture. UMIACS-TR 2011-02.

·  G.C. Caragea and U. Vishkin. Better speedups for parallel max-flow. Brief announcement. In Proc. 23rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), San Jose, California, June 2-4, 2011. Download paper. Download slides of George Caragea's talk. Open source graph algorithms.

·  N. Crowell. Parallel algorithms for graph problems, May 2011. MSc scholarly paper by a Computer Science student who evaluated XMT for some wide-interest benchmarks. This student was not part of the XMT team. Crowell's work provides independent validation of evidence published by the XMT team on the advantages of XMT. In particular, his experience has been that, beyond developing serial programs for the problems he worked on, the extra effort for producing parallel code was minimal. Download paper.

·  U. Vishkin. Restoring software productivity crucial to economic recovery: The multi-core dilemma. White paper, the Technology Innovation Program (TIP), National Institute of Standards and Technology (NIST). Input to help TIP focus the TIP program on areas of critical national need. July 2011. Download this single page paper.

·  F. Keceli, T. Moreshet and U. Vishkin. Thermal Management of a Many-Core Processor under Fine-Grained Parallelism. Proc. 5th Workshop on Highly Parallel processing on Chip(HPPC 2011), August 30, 2011, in conjunction with Euro-Par 2011, Bordeaux, France.

·  F. Keceli, T. Moreshet and U. Vishkin. Power-Performance Comparison of Single-Task Driven Many-Cores. Proc. 17th IEEE International Conference on Parallel and Distributed Systems (IEEE ICPADS), December 7-9, 2011, Tainan, Taiwan. Download paper.

·  J.A. Edwards and U. Vishkin. Better Speedups Using Simpler Parallel Programming for Graph Connectivity and Biconnectivity. Proc. International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM 2012), 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming( PPoPP'12), New Orleans, LA, February 25-29, 2012. Open source graph algorithms.

·  A. Tzannes. Advisors: R. Barua and U. Vishkin. Enhancing Productivity and Performance Portability of General-Purpose Parallel Programming. First Prize. ACM Student Research Competition, ACM International Conference on Parallel Architectures and Compilation Techniques (PACT) 2011.

·  T. Moreshet, U. Vishkin and F. Keceli. Parallel Simulation of Many-core Processors: Integration of Research and Education, in proceedings of the 119th American Society for Engineering Education (ASEE) Annual Conference, June 10-13, 2012, San Antonio, Texas. Download paper.

·  J.A. Edwards and U. Vishkin. Speedups for Parallel Graph Triconnectivity. Brief announcement. In Proc. 24th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), Pittsburgh, PA, June 25-27, 2012. Download paper and James Edwards' slides. Also, Open source graph algorithms.

·  J.A. Edwards and U. Vishkin. Truly Parallel Burrows-Wheeler Compression and Decompression. Brief announcement. In Proc. 25th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), Montreal, Canada, July 23-25, 2013. Download SPAA13 paper and slides. For more detail see the TCS version and its supporting technical reports below.

·  J.A. Edwards and U. Vishkin. Parallel Algorithms for Burrows-Wheeler Compression and Decompression. Theoretical Computer Science (TCS), due in 2014. Preprint. On-line version. Technical report of the theoretical algorithm. Technical report of the companion empirical algorithm.

·  U. Vishkin. Is Multicore Hardware for General-Purpose Parallel Processing Broken? Communications of the ACM (CACM), Volume 57, No. 4, pages 35-39, April 2014. Viewpoint article and video. The video is entitled: Does humans-in-the-service-of-technology have a future? .

Patents

8 US patents have been issued so far: 8,209,690, 8,145,879, 7,707,388, 7,523,293, 7,505,822, 6,768,336, 6,542,918, and 6,463,527. A partial summary if their content is provided in the paper Algorithmic approach to designing an easy-to-program system: can it lead to a HW-enhanced programmer's workflow add-on? (pdf, 4 pages). Proc. International Conference on Computer Design (ICCD), 2009.

Prior Generation XMT Tools & Empirical Results

The above Prior Generation link leads to older software tools as well as older supplemental information to XMT publications.


Top | Programmer's Model | On-Line Tutorial | Software Release | Teaching Parallelism & Full On-Line Course | Talks | Publications | Team

Current and recent past students in the Research Team

  • Aydin Balkan, graduate student, Electrical and Computer Engineering, University of Maryland.
  • Xingzhi Wen, graduate student, Electrical and Computer Engineering, University of Maryland.
  • Fuat Keceli, graduate student, Electrical and Computer Engineering, University of Maryland.
  • George Caragea, graduate student, Computer Science, University of Maryland.
  • Alex Tzannes, graduate student, Computer Science, University of Maryland.
  • Mike Horak, graduate student, Electrical and Computer Engineering, University of Maryland.
  • Beliz Saybasili, visiting graduate student from Turkey.
  • Mary Kiemb, graduate student, Electrical and Computer Engineering, University of Maryland.

Faculty in the Research Team

  • Professor Rajeev Barua, Electrical and Computer Engineering, University of Maryland.
  • Professor Gang Qu, Electrical and Computer Engineering, University of Maryland.

This research project is directed by Professor Uzi Vishkin.

Former Students

  • Efraim Berkovich, MS, Electrical and Computer Engineering, University of Maryland.
  • Shlomit Dascal, MS, Computer Science, Tel Aviv University; was visiting graduate student at UMIACS.
  • Dorit Naishlos, MS, Computer Science, University of Maryland.
  • Joseph Nuzman, MS, Electrical and Computer Engineering, University of Maryland. Joe started working on XMT while still an Honors ECE undergraduate student.
  • Pei Gu, MS, Electrical and Computer Engineering, University of Maryland.
  • Roni Kupershtok, post-doc, Electrical and Computer Engineering, University of Maryland.
  • Adam Beytin, undergraduate student, Computer Engineering, University of Maryland.
  • Nghi Nguen Ba, undergraduate student, Computer Engineering, University of Maryland.
  • Paul Mazzucco, undergraduate student, Computer Engineering, University of Maryland.
  • Yoni Ashar, undergraduate student, Computer Science, University of Maryland.
  • Darya Fillipova, undergraduate student, Computer Science, University of Maryland.
  • Mike Detwiler, undergraduate student, Computer Science, University of Maryland.
  • Tom Dubois, graduate student, Computer Science, University of Maryland.

Top | Programmer's Model | On-Line Tutorial | Software Release | Teaching Parallelism & Full On-Line Course | Talks | Publications | Team

Funding by the National Science Foundation and the Department of Defense is graciously acknowledged. Some of the material on this web page, and XMT subpages linked from this page, is based upon work supported by the National Science Foundation under grants No. 0325393, 0811504 and 0834373. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation or the Department of Defense.