TY - JOUR
T1 - The block distributed memory model
JF - Parallel and Distributed Systems, IEEE Transactions on
Y1 - 1996
A1 - JaJa, Joseph F.
A1 - Ryu,Kwan Woo
KW - accesses;single
KW - address
KW - algorithms;performance
KW - algorithms;performance;remote
KW - allocation;sorting;
KW - balancing;matrix
KW - bandwidth;communication
KW - communication;load
KW - complexity;computation
KW - complexity;computational
KW - complexity;cost
KW - complexity;distributed
KW - distributed
KW - evaluation;resource
KW - FFT;block
KW - Fourier
KW - latency;parallel
KW - locality;communication
KW - machines;interconnection
KW - measure;data
KW - memory
KW - model;communication
KW - model;computational
KW - multiplication;memory
KW - multiplication;parallel
KW - problems;distributed
KW - rearrangement
KW - space;sorting;spatial
KW - systems;fast
KW - topology;interprocessor
KW - transforms;matrix
AB - We introduce a computation model for developing and analyzing parallel algorithms on distributed memory machines. The model allows the design of algorithms using a single address space and does not assume any particular interconnection topology. We capture performance by incorporating a cost measure for interprocessor communication induced by remote memory accesses. The cost measure includes parameters reflecting memory latency, communication bandwidth, and spatial locality. Our model allows the initial placement of the input data and pipelined prefetching. We use our model to develop parallel algorithms for various data rearrangement problems, load balancing, sorting, FFT, and matrix multiplication. We show that most of these algorithms achieve optimal or near optimal communication complexity while simultaneously guaranteeing an optimal speed-up in computational complexity. Ongoing experimental work in testing and evaluating these algorithms has thus far shown very promising results
VL - 7
SN - 1045-9219
CP - 8
M3 - 10.1109/71.532114
ER -