TY - CONF T1 - A Fast Algorithm for Constructing Inverted Files on Heterogeneous Platforms T2 - Parallel Distributed Processing Symposium (IPDPS), 2011 IEEE International Y1 - 2011 A1 - Wei, Zheng A1 - JaJa, Joseph F. KW - architecture;graphics KW - B-tree KW - C1060;central KW - construction;multicore KW - CPU;multithreaded KW - data KW - device KW - dictionary KW - equipment;coprocessors;data KW - files KW - GPU;computer KW - graphic KW - indexer;Intel KW - pipelined KW - platform;high-throughput KW - PROCESSING KW - Quad-core;NVIDIA KW - strategy;hybrid KW - structure;CUDA KW - structure;inverted KW - structures;multiprocessing KW - systems; KW - Tesla KW - trie KW - unified KW - unit;computer KW - unit;heterogeneous KW - X5560 KW - Xeon AB - Given a collection of documents residing on a disk, we develop a new strategy for processing these documents and building the inverted files extremely fast. Our approach is tailored for a heterogeneous platform consisting of a multicore CPU and a highly multithreaded GPU. Our algorithm is based on a number of novel techniques including: (i) a high-throughput pipelined strategy that produces parallel parsed streams that are consumed at the same rate by parallel indexers, (ii) a hybrid trie and B-tree dictionary data structure in which the trie is represented by a table for fast look-up and each B-tree node contains string caches, (iii) allocation of parsed streams with frequent terms to CPU threads and the rest to GPU threads so as to match the throughput of parsed streams, and (iv) optimized CUDA indexer implementation that ensures coalesced memory accesses and effective use of shared memory. We have performed extensive tests of our algorithm on a single node (two Intel Xeon X5560 Quad-core) with two NVIDIA Tesla C1060 attached to it, and were able to achieve a throughput of more than 262 MB/s on the ClueWeb09 dataset. Similar results were obtained for widely different datasets. The throughput of our algorithm is superior to the best known algorithms reported in the literature even when compared to those run on large clusters. JA - Parallel Distributed Processing Symposium (IPDPS), 2011 IEEE International M3 - 10.1109/IPDPS.2011.107 ER -