CS-TR-3549, UMIACS-TR-95-102

A Parallel Sorting Algorithm With an Experimental Study

Previous schemes for sorting on general-purpose parallel machines have had to choose between poor load balancing and irregular communication or multiple rounds of all-to-all personalized communication. In this paper, we introduce a novel variation on sample sort which uses only two rounds of regular all-to-all personalized communication in a scheme that yields very good load balancing with virtually no overhead. This algorithm was implemented in Split-C and run on a variety of platforms, including the Thinking Machines CM-5, the IBM SP-2, and the Cray Research T3D. We ran our code using widely different benchmarks to examine the dependence of our algorithm on the input distribution. Our experimental results are consistent with the theoretical analysis and illustrate the efficiency and scalability of our algorithm across different platforms. In fact, it seems to outperform all similar algorithms known to the authors on these platforms, and its performance is invariant over the set of input distributions unlike previous efficient algorithms. Our results also compare favorably with those reported for the simpler ranking problem posed by the NAS Integer Sorting (IS) Benchmark.

HTML or PostScript or Compressed PostScript Version of this report.

   David R. Helman
   E-mail: helman@umiacs.umd.edu
   Office phone: (301)405-6757 
   FAX: (301)314-9658

Return to the Experimental Parallel Algorithmics page.

CS-TR-3549, UMIACS-TR-95-102

A Parallel Sorting Algorithm With an Experimental Study

University of Maryland

Department of Computer Science and Department of Electrical Engineering, and

The University of Maryland Institute for Advanced Computer Studies