CS-TR-3549, UMIACS-TR-95-102
Previous schemes for sorting on general-purpose parallel machines
have had to choose between poor load balancing and irregular
communication or multiple rounds of all-to-all personalized
communication. In this paper, we
introduce a novel variation on sample sort which uses only two rounds
of regular all-to-all personalized communication in a scheme that
yields very good load balancing with virtually no overhead. This
algorithm was implemented in Split-C and run on a variety of
platforms, including the Thinking Machines CM-5, the IBM SP-2, and the
Cray Research T3D. We ran our code using widely different benchmarks
to examine the dependence of our algorithm on the input distribution.
Our experimental results are consistent with the theoretical analysis
and illustrate the efficiency and scalability of our algorithm across
different platforms. In fact, it seems to outperform all similar
algorithms known to the authors on these platforms, and its
performance is invariant over the set of input distributions unlike
previous efficient algorithms. Our results also compare favorably with
those reported for the simpler ranking problem posed by the NAS
Integer Sorting (IS) Benchmark.
HTML
or
PostScript
or
Compressed PostScript
Version of this report.
For more infomation on any of the these topics, click on the hotlink.
Any queries, comments, or inquiries to:
David R. Helman
E-mail: helman@umiacs.umd.edu
Office phone: (301)405-6757
FAX: (301)314-9658

Return to the Experimental Parallel Algorithmics page.
