TY - JOUR T1 - The block distributed memory model JF - Parallel and Distributed Systems, IEEE Transactions on Y1 - 1996 A1 - JaJa, Joseph F. A1 - Ryu,Kwan Woo KW - accesses;single KW - address KW - algorithms;performance KW - algorithms;performance;remote KW - allocation;sorting; KW - balancing;matrix KW - bandwidth;communication KW - communication;load KW - complexity;computation KW - complexity;computational KW - complexity;cost KW - complexity;distributed KW - distributed KW - evaluation;resource KW - FFT;block KW - Fourier KW - latency;parallel KW - locality;communication KW - machines;interconnection KW - measure;data KW - memory KW - model;communication KW - model;computational KW - multiplication;memory KW - multiplication;parallel KW - problems;distributed KW - rearrangement KW - space;sorting;spatial KW - systems;fast KW - topology;interprocessor KW - transforms;matrix AB - We introduce a computation model for developing and analyzing parallel algorithms on distributed memory machines. The model allows the design of algorithms using a single address space and does not assume any particular interconnection topology. We capture performance by incorporating a cost measure for interprocessor communication induced by remote memory accesses. The cost measure includes parameters reflecting memory latency, communication bandwidth, and spatial locality. Our model allows the initial placement of the input data and pipelined prefetching. We use our model to develop parallel algorithms for various data rearrangement problems, load balancing, sorting, FFT, and matrix multiplication. We show that most of these algorithms achieve optimal or near optimal communication complexity while simultaneously guaranteeing an optimal speed-up in computational complexity. Ongoing experimental work in testing and evaluating these algorithms has thus far shown very promising results VL - 7 SN - 1045-9219 CP - 8 M3 - 10.1109/71.532114 ER -