Teaching and Self-Study of Parallelism:

A Full On-Line Course on Parallel Algorithms Coupled with XMTC Programming Assignments



Background

The so-called PRAM (for parallel random-access machine, or model) parallel algorithmic theory has been widely embraced by the Computer Science theory and algorithms communities, as a general-purpose approach to parallel algorithms. This is a quite unique distinction of the PRAM approach. Starting with PRAM algorithms, the Explicit Multi-Threading (XMT) project, that started at the University of Maryland (UMD) in 1997, provides a holistic approach towards desktop supercomputing. A 64-processor XMT machine has been extensively tested and used at UMD since early 2007. Envisioning a 1000-processor on-chip parallel machine in the early 2010s, the XMT architecture can be effectively programmed using a PRAM-like XMTC programming language, that adds only two simple instructions to the standard C programming language. For more background on the XMT approach, please go to the XMT home page http://www.umiacs.umd.edu/users/vishkin/XMT
Learning basic concepts of parallelism in algorithms and programming without actually doing the programming is insufficient (perhaps like learning swimming only by correspondence). The platform noted in the current web page will allow you to use your own computer for practicing programming within an XMT environment. We suggest material for courses on parallel algorithms and parallel programming in Section I below. Section II below provides a link for suggestions on self-study of this material.

I. TEACHING PARALLELISM


Contents:

MOTIVATION FOR EDUCATORS

The huge investment in parallel computing over several decades produced a lot of knowledge. However, the world is yet to see a truly successful general-purpose parallel computer for single task completion time. Still, even the most conservative computer vendors decided to abandon the serial paradigm that worked for them for so long and bet the future of their companies on parallel computing. It is important to realize that they really did not have a choice. More than 5 years already passed since around 2003, when increased power consumption of computer chips froze progress in clock frequency.

The actions of vendors so far are truly puzzling:
  1. They essentially bet their future on approaches that are proven failures (as general-purpose approaches).
  2. While they are committed to exponentially increasing the number of processors on a chip over the coming the decade, they are not giving enough details on their architecture, or on how they will be effectively programmed. This deters application software vendors (ASVs) who consider making big investments. The reason is that competitor ASVs will be able to provide a better product for a fraction of the cost, by just waiting a few years till the dust settles on these architectures. This ASV impasse can lead to a serious recession in a significant sector of the IT market. 
  3. The PRAM approach has been a decisive winner (when it come to parallel computing) in the theory/algorithms community. Vendors would like to replace serial computing by parallel computing, but often do not consider algorithms, or how to teach them, as something that must be a part of their concern.

What can we do as CS educators?

There is no question anymore whether we should start teaching parallel algorithms and programming at all levels. The question is when. IMHO we will deserve malpractice suits if we defer this teaching by too much. Currently, we produce 22-year old dinosaurs trained for 50-year career dominated by parallelism through programming yesterday’s computers. In fact, we don’t only under-teach: we mis-teach bad serial habits that will make it much more difficult to switch to parallelism later.

Luckily the ASV impasse noted above does not affect education.
  1. PRAM algorithms are necessary knowledge for any CS graduate, as explained next. In the same way that serial algorithms require accounting for the total number of operations (time complexity) of an algorithms, the PRAM approach requires accounting for the total number of operations of a parallel algorithm ("work") and for its time under the assumptions that unlimited hardware is available ("depth"). This work-depth level of cognition falls in the common denominator of all other approaches to parallel computing, and therefore any CS graduate will have to study it.
  2. PRAM algorithms are also sufficient. In the past, the PRAM was criticized for being too simplistic for practice. However, the 64-processor PRAM-On-Chip hardware prototype built at UMD http://www.umiacs.umd.edu/users/vishkin/XMT/CompFrontiers08.pdf finally showed that a machine that can look to the programmer like a PRAM can be built. In fact, the basic XMT (explicit multi-threaded) architecture can be scaled to 1000 on-chip processors.
Based on item 1 (PRAM algorithms are necessary knowledge), we can immediately go ahead and teach PRAM algorithms without waiting for the vendors to converge. This will be a step in the right direction regardless of which approaches to parallelism prevail.

MOTIVATION FOR TEACHING PARALLELISM IN THEORY CLASSES

Motivation for incoporating programming assignments in theory courses on parallelism

It will be nice if theorists endorse the previous argument and use it to convince their colleagues that PRAM algorithms need to be taught. But, I am concerned that some of us will do the following: teach a course on parallel algorithms as a purely theory course WITHOUT any connection to programming. This will miss the point as it ignores the need to relate algorithms to programming. The Q&A at the end of this text elaborate further on the programming issue.

SUGGESTIONS FOR WHAT TO TEACH AND HOW

  1. In Class Presentation: Uzi Vishkin's UMD course ENEE759K/CMSC751 Parallel Algorithms, Spring 2009 is now fully available on-line
  2. Supporting documentation on XMTC programming . Refer the students to: the XMTC Manual and the XMTC tutorial.
  3. Programming assignments . Please look up under assignments on this course page, or this alternative course page. For XMT-C code examples download this paper (pdf, 62 pages). Finally, instructors can request more code examples by e-mail.
  4. Running programming assignments . In mid-September 2008, the UMD XMT project released to the public the XMT software tools http://www.umiacs.umd.edu/users/vishkin/XMT/sw-release.html, including: This allows your students to run XMTC code on an emulated XMT machine. To remind you, a hardware prototype of such a 64-processor machine (using FPGA technology) has been in use at UMD since January 2007. A compiler that translates XMTC to OpenMP is also available for download, giving your students an alternative way to run their assignments.
Finally, please note that this type of programming cannot be too difficult. I have given a 1-day parallel algorithms tutorial to a dozen high school students in Fall 2007 and subsequently some of them managed to do on their own 8 programming assignments. In fact, the above link to programming assignments gives these 8 programming assignments. The only help the high school student got was one office hour per week by an undergraduate teaching assistant. They did not get any school credit for their work. Their participation was in the context of a computer club after completing their regular school work (8 periods per day).

If you are looking for code examples, you are welcome to write to me.

INTERESTING TEACHING LINKS


II. SELF-STUDY OF PARALLELISM

Our suggestions for self-study are already covered under on-line tutorial .

Q&A

Q: I never learned parallel programming formally, but I picked up some ideas in my free time from Java/MPI/OpenMP/etc. How do any of these relate to XMTC parallel programming?

A: XMTC parallel programming is simpler and different.

Q: The problem of algorithms being taught independently of programming is present within the exclusively serial world. What would you say to the many theorists who are resistant to the idea of having a heavy programming component in their courses?

A: IMHO the serial case is completely different. Most students have experienced/learned serial programming BEFORE taking the serial algorithms course. This is NOT the case for parallel programming. My experience is that students learn best if parallel programming is coupled with parallel algorithms. The main difference is that the parallel algorithms course is where parallel programming should be FIRST taught. The reason is that parallelism requires introduction of some first principles representing an "alien culture" to students. In contrast, serial computing is related to: (i) mathematical induction, (ii) the way our brain instructs our body (as a single processor), etc. In contrast, very little has prepared us for parallel computing.


Note: The basic points above were first presented at a panel discussion at IPDPS 2008, Miami, Florida. The topic was: "How to avoid making the same mistakes all over again or... how to make the experiences of the parallel processing communities useful for the multi/many-core generation"  The slides  of the presentation are available here. The panel presentation also points out that 19 out of the 38 participants in an IBM-NSF workshop on future directions for parallel computing that took place in December 1988 were algorithms/theory people. Since IPDPS08 had 600 registrants, I suggested to the IPDPS08 audience to imagine that there are another group of 600 algorithm/theory people in the room, and consider the effects of nearly all of them say that we should go ahead and incorporate the PRAM into future multi-core architectures. Lance Fortnow's recent blog postings The Revenge of Parallelism and Teaching Parallelism led to discussion and further development of these points.