next up previous
Next: Conclusion Up: Performance Evaluation Previous: Experimental Results

Comparison with Previous Results

Despite the enormous theoretical interest in parallel sorting, we were able to locate relatively few empirical studies. Of these, only a few were done on machines which either were available to us for comparison or involved code which could be ported to these machines for comparison. In Tables IV and V, we compare the performance of our sample sort algorithm with two other sample sort algorithms. In all cases, the code was written in Split-C. In the case of Alexandrov et al. [1], the times were determined by us directly on a 32 node CM-5 using code supplied by the authors which had been optimized for a Meiko CS-2. In the case of Dusseau [17], the times were obtained from the graphed results reported for a 64 node CM-5.

 
Table: Total execution time (in seconds) required to sort a variety of benchmarks and problem sizes, comparing our version of sample sort (HBJ) with that of Alexandrov et al. (AIS) on a 32-node CM-5.
We were unable to run the (AIS) code on this input.

 
Table v: Time required per element (in microseconds) to sample sort 64M integers, comparing our results (HBJ) with those obtained from the graphed results reported by Dusseau (DUS) on a 64 node CM-5.

Finally, there are the results for the NAS Parallel Benchmark [31] for integer sorting (IS). The name of this benchmark is somewhat misleading. Instead of requiring that the integers be placed in sorted order as we do, the benchmark only requires that they be ranked without any reordering, which is a significantly simpler task. Table VI compares our results on the Class A NAS Benchmark with the best times reported for the TMC CM-5 and the Cray T3D. We believe that our results, which were obtained using high-level, portable code, compare favorably with the other reported times, which were obtained by the vendors using machine-specific implementations and perhaps system modifications.

 
Table vi: Comparison of our execution time (in seconds) with the best reported times for the Class A NAS Parallel Benchmark for integer sorting. Note that while we actually place the integers in sorted order, the benchmark only requires that they be ranked without actually reordering.

The only performance studies we are aware of on similar platforms for generalized sorting are those of Tridgell and Brent [33], who report the performance of their algorithm using a 32 node CM-5 on a uniformly distributed random input of signed integers, as described in Table VII.

 
Table vii: Total execution time (in seconds) required to sort 8M signed integers, comparing our results (HBJ) with those of Tridgell and Brent (TB) on a 32 node CM-5.



next up previous
Next: Conclusion Up: Performance Evaluation Previous: Experimental Results

helman@umiacs.umd.edu