A Full On-Line Course on Parallel Algorithms Coupled with XMTC Programming Assignments
Background
The so-called PRAM (for parallel random-access machine, or model) parallel algorithmic theory
has been widely embraced by the Computer Science theory and algorithms communities, as a general-purpose
approach to parallel algorithms. This is a quite unique distinction of the PRAM approach.
Starting with PRAM algorithms, the Explicit Multi-Threading (XMT) project, that started at the University of Maryland (UMD) in 1997,
provides a holistic approach towards desktop supercomputing. A 64-processor XMT machine has been extensively tested and used at UMD since early 2007.
Envisioning a
1000-processor on-chip parallel machine in the early 2010s, the XMT architecture can be effectively programmed using a PRAM-like
XMTC programming language, that adds only two simple instructions to the standard C programming language.
For more background on the XMT approach, please go to the XMT home page http://www.umiacs.umd.edu/users/vishkin/XMT
Learning basic concepts of parallelism in algorithms and programming without actually doing the programming is insufficient (perhaps like learning swimming only by correspondence).
The platform noted in the current web page will allow you to use your own computer for practicing programming within an XMT environment.
We suggest material for courses on parallel algorithms and parallel programming in Section I below. Section II below provides a link for suggestions on self-study of this
material.
I. TEACHING PARALLELISM
Contents:
MOTIVATION FOR EDUCATORS
The
huge investment in parallel computing over several decades produced a
lot of knowledge. However, the world is yet to see a truly successful
general-purpose parallel computer for single task completion time.
Still,
even the most conservative computer vendors decided to abandon the
serial paradigm that worked for them for so long and bet the future of
their companies on parallel computing. It is important to realize that they really did not have a
choice. More than 5 years already passed since around 2003, when increased power consumption of
computer chips froze progress in clock frequency.
The actions of vendors so far are truly puzzling:
-
They essentially bet their future on approaches that are proven
failures (as general-purpose approaches).
- While they are committed to exponentially increasing the number
of
processors on a chip over the coming the decade, they are not giving
enough details on their architecture, or on how they will be effectively
programmed. This deters application software vendors (ASVs) who consider
making big investments. The reason is that competitor ASVs will be
able to provide a better product for a fraction of the cost, by just
waiting a few years till the dust settles on these architectures. This
ASV impasse can lead to a serious recession in a significant sector of the IT
market.
- The PRAM approach has been a decisive winner
(when it come to parallel computing) in the theory/algorithms community.
Vendors would like to replace serial computing
by parallel computing, but often do not consider algorithms, or how to
teach them, as something that must be a part of their concern.
What can we do as CS educators?
There
is no question anymore whether we should start teaching parallel
algorithms and programming at all levels. The question is when. IMHO we
will deserve malpractice suits if we defer this teaching by too much.
Currently, we produce 22-year old dinosaurs trained for 50-year career
dominated by parallelism through programming yesterday’s computers. In fact,
we don’t only under-teach: we mis-teach bad serial habits that will
make it much more difficult to switch to parallelism later.
Luckily the ASV impasse noted above does not affect education.
- PRAM algorithms are necessary knowledge for any CS graduate, as
explained next. In the same way that serial algorithms require
accounting for the total number of operations (time complexity) of an
algorithms, the PRAM approach requires accounting for the total number
of operations of a parallel algorithm ("work") and for its time under
the assumptions that unlimited hardware is available ("depth"). This
work-depth level of cognition falls in the common denominator of all
other approaches to parallel computing, and therefore any CS graduate
will have to study it.
- PRAM algorithms are also sufficient. In
the past, the PRAM was criticized for being too simplistic for
practice. However, the 64-processor PRAM-On-Chip hardware prototype
built at UMD
http://www.umiacs.umd.edu/users/vishkin/XMT/CompFrontiers08.pdf finally
showed that a machine that can look to the programmer like a PRAM can
be built. In fact, the basic XMT (explicit multi-threaded) architecture
can be scaled to 1000 on-chip processors.
Based on item 1 (PRAM algorithms are necessary knowledge), we can
immediately go ahead and teach PRAM algorithms without waiting for the
vendors to converge. This will be a step in the right direction regardless of which approaches to parallelism prevail.
MOTIVATION FOR TEACHING PARALLELISM IN THEORY CLASSES
- It does not make sense to have a new platform of general-purpose
parallel computing succeed the established serial platform without
having a one-to-one match of EVERYTHING, including algorithms and data
structures.
- In particular, it does not make sense to teach parallel
programming without teaching parallel algorithms and data structures.
The gap between programming and algorithms must be bridged, so that the
continuum from algorithms and data-structures to programming will
resemble as much as possible the continuum in serial computing.
- Since the PRAM theory is the only serious candidate developed in
nearly 3 decades of research, PRAM algorithms have got to be taught.
Motivation for incoporating programming assignments in theory courses on parallelism
It will be nice if theorists endorse the previous argument and use it to convince
their colleagues that PRAM algorithms need to be taught. But, I am concerned that some of us will do the following: teach a
course on parallel algorithms as a purely theory course WITHOUT any
connection to programming. This will miss the point as it ignores the
need to relate algorithms to programming. The Q&A at the end of
this text elaborate further on the programming issue.
SUGGESTIONS FOR WHAT TO TEACH AND HOW
- In Class Presentation:
Uzi Vishkin's UMD course ENEE759K/CMSC751 Parallel Algorithms, Spring 2009 is now fully available on-line
- PRAM Algorithms : Nearly all the class time should be devoted to PRAM algorithms. There are several excellent sources for that, including the following two books:
J. JaJa, An Introduction to Parallel Algorithms, Addison-Wesley, 1992, and
J. Keller, C.W. Kessler and J.L. Traeff. Practical PRAM Programming, Wiley-Interscience, 2001.
I have been using my
class
notes.
An important point in teaching this material, regardless of the source, is that this theory is meant as an extension (not replacement)
of the standard serial algorithmic theory. Indeed, a tenet of the PRAM algorithmic theory is the expression of parallelism in terms of
"work" and "depth", regardless of the number of processors used by the implementation, be it many processors, few processors, or even a
single processor.
Video Recording of Uzi Vishkin's Spring 2009 Parallel Algorithms lectures
are now available for free download.
While around 20% of the material is too advanced for
undergraduate students (the course being recorded is a graduate course), our initial experience has been that the lectures may be accessible for general understanding even to motivated undergraduate freshmen. The main
difference has been in expectations from students at different levels. For example, graduate students had to solve all 45
exercises in the class notes. Freshmen are expected to solve none, or very few. There is also a difference in
programming assignments.
- XMTC Programming. Prepare yourself by first reading Section 2.1 entitled XMTC in FPGA-Based
Prototype of a PRAM-On-Chip Processor.
Section 2.1 reviews a modest extension to the C programming language called XMTC
that allows PRAM-like programming. XMTC essentially adds only 2 basic
commands to C: Spawn and PS (for prefix-sum). It also minimally relaxes the strict synchrony of the PRAM model, without making this
relaxation a programming challenge.
Devote a total of around 15-20 minutes similar to slides
37-39 in these
slides to present XMTC. Slide 40 can guide a discussion.
- Supporting documentation on XMTC programming . Refer the students
to: the XMTC
Manual and the XMTC
tutorial.
- Programming assignments . Please look up under assignments on this
course
page, or this alternative course
page. For XMT-C code examples download this paper
(pdf, 62 pages).
Finally, instructors can request more code examples by e-mail.
- Running programming assignments . In mid-September 2008, the UMD XMT project released to the public the XMT software tools
http://www.umiacs.umd.edu/users/vishkin/XMT/sw-release.html, including:
- a cycle accurate simulator of the PRAM-On-Chip machine, and
- a compiler from XMTC to that machine.
This allows your students to run XMTC code on an emulated
XMT machine. To remind you, a hardware prototype
of such a 64-processor machine (using FPGA technology) has been in use at UMD since
January 2007. A compiler that translates XMTC to OpenMP is also available for download, giving your students an alternative way to run their
assignments.
Finally, please note that this type of programming cannot be
too difficult. I have given a 1-day parallel algorithms tutorial to a
dozen high school students in Fall 2007 and subsequently some of them
managed to do on their own 8 programming assignments. In fact, the
above link to programming assignments gives these 8 programming
assignments. The only help the high school student got was one office
hour per week by an undergraduate teaching assistant. They did not get
any school credit for their work. Their participation was in the
context of a computer club after completing their regular school work
(8 periods per day).
If you are looking for code examples, you are welcome to write to
me.
INTERESTING TEACHING LINKS
-
Spring 2012: Composition of Parallel Algorithms (in Greek), University of Cyprus.
You cannot imagine the satisfaction a person gets, after designing and analyzing algorithms on
PRAM for more than 12 years, to finally be able to tell to students...
"this can be easily turn to code and run on a real-world multi-core machine"!!!--Prof. Chryssis
Georgiou on using the XMT platform in his University of Cyprus class.
-
Spring 2011: Parallel Algorithms, University of California, Davis.
-
Spring 2011: Parallel Algorithms, University of California, San Diego.
-
Winter 2010/1: Parallel Computing, Electrical Engineering, Technion, Israel-Institute-of-Technology.
-
Registration form for Middle School Computer Engineering Summer Camp at the John F. Kennedy High School,
Silver Spring, Maryland, July 13-24, 2009, Montgomery County Public Schools:
"Students will learn about computer programs for the 64-processor desktop-of-the-future computer built
at UMCP and will explore relevant Computer Science and Mathematics concepts".
-
"Parallel Programming Comes to Ingenuity", The Ingenuity Spotlight, February 2009, The Ingenuity Project at the Baltimore Polytechnic Institute
High School.
Note the following comment in the story on page 2: ''The three classes in December were part of a study performed by UMCP
Department of Computer Science to test these new teaching methods, and it turned out to be successful: in the questionnaires filled out
by the students there was an equal split between people who said the material was the right level of difficulty and people who said
it was too easy.'' (Bold face added, UV.)
-
2008-9: Parallel Computing , Thomas Jefferson High School for Science and Technology, Alexandria, VA.
Chapter on Explicit Multi-Threading (XMT).
-
Spring 2009: Algorithms and Pogramming in a Parallel Environment (syllabus in Hebrew) , The Academic College of Tel-Aviv Yaffo.
-
Resources for multicore programming education , webpage maintained by Nir Shavit, Tel Aviv University.
II. SELF-STUDY OF PARALLELISM
Our suggestions for self-study are already covered under
on-line
tutorial .
Q&A
Q: I never learned parallel
programming formally, but I picked
up some ideas in my free time from Java/MPI/OpenMP/etc. How do any of
these relate to XMTC parallel programming?
A: XMTC parallel programming
is simpler and different.
Q: The problem of algorithms
being taught independently of
programming
is present within the exclusively serial world. What would you say to
the many theorists who are resistant to the idea of having a heavy
programming component in their courses?
A: IMHO the serial case is
completely different. Most students
have experienced/learned serial programming BEFORE taking the serial
algorithms course. This is NOT the case for parallel programming. My
experience is that students learn best if parallel programming is
coupled with parallel algorithms. The main difference is that the
parallel algorithms course is where parallel programming should be
FIRST taught. The reason is that parallelism requires introduction of
some first principles representing an "alien culture" to students. In
contrast, serial computing is related to: (i) mathematical induction,
(ii) the way our brain instructs our body (as a single processor), etc.
In contrast, very little has prepared us for parallel computing.
Note: The basic points above were first
presented at a panel discussion at IPDPS 2008, Miami, Florida. The topic
was: "How to avoid making the same mistakes all over again or... how to
make the experiences of the parallel processing communities useful for
the multi/many-core generation" The slides of the
presentation are available here.
The panel presentation also points out that
19 out of the 38 participants in an IBM-NSF workshop on future
directions for parallel computing that took place in December 1988 were
algorithms/theory people. Since IPDPS08 had 600 registrants, I
suggested to the IPDPS08 audience to imagine that there are another
group of 600 algorithm/theory people in the room, and consider the
effects of nearly all of them say that we should go ahead and
incorporate the PRAM into future multi-core architectures. Lance Fortnow's
recent blog postings The
Revenge of Parallelism and
Teaching Parallelism
led to discussion and further development of these points.