-To appear in the Journal of Parallel and Distributed Computing.
Previous schemes for sorting on general-purpose parallel
machines have had to choose between poor load balancing and irregular
communication or multiple rounds of all-to-all personalized communication.
In this paper,
we introduce a novel variation on sample sort which
uses only two rounds of regular all-to-all personalized communication in
a scheme that yields very good load balancing with virtually no overhead.
Moreover, unlike previous variations, our algorithm efficiently handles the
presence of duplicate values without the overhead of tagging each element
with a unique identifier.
This algorithm was implemented in Split-C and run on a variety of platforms,
including the Thinking Machines CM-5, the IBM SP-2, and the Cray Research
T3D. We ran our code using widely different benchmarks
to examine the dependence of our algorithm on the input
distribution. Our experimental results illustrate the
efficiency and scalability of our
algorithm across different platforms. In fact, it seems to outperform
all similar algorithms known to the authors on these platforms, and its
performance is invariant over the set of input distributions unlike
previous efficient algorithms. Our results also compare favorably with
those reported for the simpler ranking problem posed by the NAS Integer
Sorting (IS) Benchmark.
Version of this report.
Click here to download this sorting code (.tar.Z)
For more infomation on any of the these topics, click on the hotlink.
Any queries, comments, or inquiries to:
David R. Helman
Office phone: (301)405-6757
Return to the Experimental Parallel Algorithmics page.