We start by describing the counting sort algorithm used to sort on individual blocks of the keys. The Counting Sort algorithm sorts n integers in the range by using R counters to accumulate the number of keys equal to i in bucket , for , followed by determining the rank of the each element. Once the rank of each element is known, we can use our h-relation personalized communication to move each element into the correct position; in this case . Counting Sort is a stable sorting routine, that is, if two keys are identical, their relative order in the final sort remains the same as their initial order.
In a practical integer sorting problem, we expect . The pseudocode for our Counting Sort algorithm uses six major steps and is as follows.
The analysis of our counting sort algorithm is as follows. Steps (1), (3), and (5) execute in O local computation time with no communication. Steps (2), (4), and (6) are communication supersets and have the following analysis. Steps (2) and (4) are the transpose primitive with block sizes and and hence result in O communication. Step (6) uses the personalized communication primitive for n elements distributed equally across p processors. Because this routing is a permutation , it has the following complexity
provided that . Thus, the overall complexity of our Counting Sort algorithm is given by
Notice that an obvious lower bound to sort the integers is , and hence our algorithm is asymptotically optimal when and .