Teaching/Self-Study of Parallelism - A Full On-Line Parallel Algorithms Course Coupled with XMTC Programming

Teaching and Self-Study of Parallelism:

A Full On-Line Course on Parallel Algorithms Coupled with XMTC Programming Assignments

Background

The so-called PRAM (for parallel random-access machine, or model) parallel algorithmic theory has been widely embraced by the Computer Science theory and algorithms communities, as a general-purpose approach to parallel algorithms. This is a quite unique distinction of the PRAM approach. Starting with PRAM algorithms, the Explicit Multi-Threading (XMT) project, that started at the University of Maryland (UMD) in 1997, provides a holistic approach towards desktop supercomputing. A 64-processor XMT machine has been extensively tested and used at UMD since early 2007. Envisioning a 1000-processor on-chip parallel machine in the early 2010s, the XMT architecture can be effectively programmed using a PRAM-like XMTC programming language, that adds only two simple instructions to the standard C programming language. For more background on the XMT approach, please go to the XMT home page http://www.umiacs.umd.edu/users/vishkin/XMT
Learning basic concepts of parallelism in algorithms and programming without actually doing the programming is insufficient (perhaps like learning swimming only by correspondence). The platform noted in the current web page will allow you to use your own computer for practicing programming within an XMT environment. We suggest material for courses on parallel algorithms and parallel programming in Section I below. Section II below provides a link for suggestions on self-study of this material.

I. TEACHING PARALLELISM

MOTIVATION FOR EDUCATORS

The huge investment in parallel computing over several decades produced a lot of knowledge. However, the world is yet to see a truly successful general-purpose parallel computer for single task completion time. Still, even the most conservative computer vendors decided to abandon the serial paradigm that worked for them for so long and bet the future of their companies on parallel computing. It is important to realize that they really did not have a choice. More than 5 years already passed since around 2003, when increased power consumption of computer chips froze progress in clock frequency.

The actions of vendors so far are truly puzzling:

They essentially bet their future on approaches that are proven failures (as general-purpose approaches).
While they are committed to exponentially increasing the number of processors on a chip over the coming the decade, they are not giving enough details on their architecture, or on how they will be effectively programmed. This deters application software vendors (ASVs) who consider making big investments. The reason is that competitor ASVs will be able to provide a better product for a fraction of the cost, by just waiting a few years till the dust settles on these architectures. This ASV impasse can lead to a serious recession in a significant sector of the IT market.
The PRAM approach has been a decisive winner (when it come to parallel computing) in the theory/algorithms community. Vendors would like to replace serial computing by parallel computing, but often do not consider algorithms, or how to teach them, as something that must be a part of their concern.

What can we do as CS educators?

There is no question anymore whether we should start teaching parallel algorithms and programming at all levels. The question is when. IMHO we will deserve malpractice suits if we defer this teaching by too much. Currently, we produce 22-year old dinosaurs trained for 50-year career dominated by parallelism through programming yesterday’s computers. In fact, we don’t only under-teach: we mis-teach bad serial habits that will make it much more difficult to switch to parallelism later.

Luckily the ASV impasse noted above does not affect education.

PRAM algorithms are necessary knowledge for any CS graduate, as explained next. In the same way that serial algorithms require accounting for the total number of operations (time complexity) of an algorithms, the PRAM approach requires accounting for the total number of operations of a parallel algorithm ("work") and for its time under the assumptions that unlimited hardware is available ("depth"). This work-depth level of cognition falls in the common denominator of all other approaches to parallel computing, and therefore any CS graduate will have to study it.
PRAM algorithms are also sufficient. In the past, the PRAM was criticized for being too simplistic for practice. However, the 64-processor PRAM-On-Chip hardware prototype built at UMD http://www.umiacs.umd.edu/users/vishkin/XMT/CompFrontiers08.pdf finally showed that a machine that can look to the programmer like a PRAM can be built. In fact, the basic XMT (explicit multi-threaded) architecture can be scaled to 1000 on-chip processors.

Based on item 1 (PRAM algorithms are necessary knowledge), we can immediately go ahead and teach PRAM algorithms without waiting for the vendors to converge. This will be a step in the right direction regardless of which approaches to parallelism prevail.

MOTIVATION FOR TEACHING PARALLELISM IN THEORY CLASSES

It does not make sense to have a new platform of general-purpose parallel computing succeed the established serial platform without having a one-to-one match of EVERYTHING, including algorithms and data structures.
In particular, it does not make sense to teach parallel programming without teaching parallel algorithms and data structures. The gap between programming and algorithms must be bridged, so that the continuum from algorithms and data-structures to programming will resemble as much as possible the continuum in serial computing.
Since the PRAM theory is the only serious candidate developed in nearly 3 decades of research, PRAM algorithms have got to be taught.

Motivation for incoporating programming assignments in theory courses on parallelism

It will be nice if theorists endorse the previous argument and use it to convince their colleagues that PRAM algorithms need to be taught. But, I am concerned that some of us will do the following: teach a course on parallel algorithms as a purely theory course WITHOUT any connection to programming. This will miss the point as it ignores the need to relate algorithms to programming. The Q&A at the end of this text elaborate further on the programming issue.

SUGGESTIONS FOR WHAT TO TEACH AND HOW

In Class Presentation: Uzi Vishkin's UMD course ENEE759K/CMSC751 Parallel Algorithms, Spring 2009 is now fully available on-line
- PRAM Algorithms : Nearly all the class time should be devoted to PRAM algorithms. There are several excellent sources for that, including the following two books:
  J. JaJa, An Introduction to Parallel Algorithms, Addison-Wesley, 1992, and
  J. Keller, C.W. Kessler and J.L. Traeff. Practical PRAM Programming, Wiley-Interscience, 2001.
  I have been using my class notes.
  An important point in teaching this material, regardless of the source, is that this theory is meant as an extension (not replacement) of the standard serial algorithmic theory. Indeed, a tenet of the PRAM algorithmic theory is the expression of parallelism in terms of "work" and "depth", regardless of the number of processors used by the implementation, be it many processors, few processors, or even a single processor.
  Video Recording of Uzi Vishkin's Spring 2009 Parallel Algorithms lectures are now available for free download. While around 20% of the material is too advanced for undergraduate students (the course being recorded is a graduate course), our initial experience has been that the lectures may be accessible for general understanding even to motivated undergraduate freshmen. The main difference has been in expectations from students at different levels. For example, graduate students had to solve all 45 exercises in the class notes. Freshmen are expected to solve none, or very few. There is also a difference in programming assignments.
- XMTC Programming. Prepare yourself by first reading Section 2.1 entitled XMTC in FPGA-Based Prototype of a PRAM-On-Chip Processor. Section 2.1 reviews a modest extension to the C programming language called XMTC that allows PRAM-like programming. XMTC essentially adds only 2 basic commands to C: Spawn and PS (for prefix-sum). It also minimally relaxes the strict synchrony of the PRAM model, without making this relaxation a programming challenge. Devote a total of around 15-20 minutes similar to slides 37-39 in these slides to present XMTC. Slide 40 can guide a discussion.
Supporting documentation on XMTC programming . Refer the students to: the XMTC Manual and the XMTC tutorial.
Programming assignments . Please look up under assignments on this course page, or this alternative course page. For XMT-C code examples download this paper (pdf, 62 pages). Finally, instructors can request more code examples by e-mail.
Running programming assignments . In mid-September 2008, the UMD XMT project released to the public the XMT software tools http://www.umiacs.umd.edu/users/vishkin/XMT/sw-release.html, including:
- a cycle accurate simulator of the PRAM-On-Chip machine, and
- a compiler from XMTC to that machine.
This allows your students to run XMTC code on an emulated XMT machine. To remind you, a hardware prototype of such a 64-processor machine (using FPGA technology) has been in use at UMD since January 2007. A compiler that translates XMTC to OpenMP is also available for download, giving your students an alternative way to run their assignments.

Finally, please note that this type of programming cannot be too difficult. I have given a 1-day parallel algorithms tutorial to a dozen high school students in Fall 2007 and subsequently some of them managed to do on their own 8 programming assignments. In fact, the above link to programming assignments gives these 8 programming assignments. The only help the high school student got was one office hour per week by an undergraduate teaching assistant. They did not get any school credit for their work. Their participation was in the context of a computer club after completing their regular school work (8 periods per day).

If you are looking for code examples, you are welcome to write to me.

INTERESTING TEACHING LINKS

Spring 2012: Composition of Parallel Algorithms (in Greek), University of Cyprus. You cannot imagine the satisfaction a person gets, after designing and analyzing algorithms on PRAM for more than 12 years, to finally be able to tell to students... "this can be easily turn to code and run on a real-world multi-core machine"!!!--Prof. Chryssis Georgiou on using the XMT platform in his University of Cyprus class.
Spring 2011: Parallel Algorithms, University of California, Davis.
Spring 2011: Parallel Algorithms, University of California, San Diego.
Winter 2010/1: Parallel Computing, Electrical Engineering, Technion, Israel-Institute-of-Technology.
Registration form for Middle School Computer Engineering Summer Camp at the John F. Kennedy High School, Silver Spring, Maryland, July 13-24, 2009, Montgomery County Public Schools: "Students will learn about computer programs for the 64-processor desktop-of-the-future computer built at UMCP and will explore relevant Computer Science and Mathematics concepts".
"Parallel Programming Comes to Ingenuity", The Ingenuity Spotlight, February 2009, The Ingenuity Project at the Baltimore Polytechnic Institute High School. Note the following comment in the story on page 2: ''The three classes in December were part of a study performed by UMCP Department of Computer Science to test these new teaching methods, and it turned out to be successful: in the questionnaires filled out by the students there was an equal split between people who said the material was the right level of difficulty and people who said it was too easy.'' (Bold face added, UV.)
2008-9: Parallel Computing , Thomas Jefferson High School for Science and Technology, Alexandria, VA. Chapter on Explicit Multi-Threading (XMT).
Spring 2009: Algorithms and Pogramming in a Parallel Environment (syllabus in Hebrew) , The Academic College of Tel-Aviv Yaffo.
Resources for multicore programming education , webpage maintained by Nir Shavit, Tel Aviv University.

II. SELF-STUDY OF PARALLELISM

Our suggestions for self-study are already covered under on-line tutorial .

Q&A

Q: I never learned parallel programming formally, but I picked up some ideas in my free time from Java/MPI/OpenMP/etc. How do any of these relate to XMTC parallel programming?

A: XMTC parallel programming is simpler and different.

Q: The problem of algorithms being taught independently of programming is present within the exclusively serial world. What would you say to the many theorists who are resistant to the idea of having a heavy programming component in their courses?

A: IMHO the serial case is completely different. Most students have experienced/learned serial programming BEFORE taking the serial algorithms course. This is NOT the case for parallel programming. My experience is that students learn best if parallel programming is coupled with parallel algorithms. The main difference is that the parallel algorithms course is where parallel programming should be FIRST taught. The reason is that parallelism requires introduction of some first principles representing an "alien culture" to students. In contrast, serial computing is related to: (i) mathematical induction, (ii) the way our brain instructs our body (as a single processor), etc. In contrast, very little has prepared us for parallel computing.

Note: The basic points above were first presented at a panel discussion at IPDPS 2008, Miami, Florida. The topic was: "How to avoid making the same mistakes all over again or... how to make the experiences of the parallel processing communities useful for the multi/many-core generation" The slides of the presentation are available here. The panel presentation also points out that 19 out of the 38 participants in an IBM-NSF workshop on future directions for parallel computing that took place in December 1988 were algorithms/theory people. Since IPDPS08 had 600 registrants, I suggested to the IPDPS08 audience to imagine that there are another group of 600 algorithm/theory people in the room, and consider the effects of nearly all of them say that we should go ahead and incorporate the PRAM into future multi-core architectures. Lance Fortnow's recent blog postings The Revenge of Parallelism and Teaching Parallelism led to discussion and further development of these points.