A typical operation in our image processing algorithms requires that each pixel be updated based on the pixel values of a small neighborhood. Such an operation amounts to a regular grid shift of a element parallel variable data array.

Two phases occur in the regular grid shift:

- Computation Time
- = Each node accesses its own local memory to shift up its own subgrid;
- Communication Time
- = Each node sends and receives the elements lying along
the shifted subgrid border.

For processors, takes O =
O time.

Next we make some observations about the communication phase:

- All communications are regular:

sends to ,

receives from ; - This regular permutation is known a priori;

This communication pattern is a block permutation, and the time complexity for this operation is given in (3). Note that each node must send a constant number of upper rows of its subgrid to its north adjacent node in the major grid. There are elements along this border. Thus, , and the communications time is as follows:

Therefore, a regular grid shift of a data
array on **p** nodes connected by a CM-5 fat-tree has the following time
and communication complexity:

David A. Bader

dbader@umiacs.umd.edu