TY - JOUR T1 - On implementing graph cuts on cuda JF - First Workshop on General Purpose Processing on Graphics Processing Units Y1 - 2007 A1 - Hussein,M. A1 - Varshney, Amitabh A1 - Davis, Larry S. AB - The Compute Unified Device Architecture (CUDA)has enabled graphics processors to be explicitly programmed as general-purpose shared-memory multi-core processors with a high level of parallelism. In this paper, we present our preliminary results of implementing the Graph Cuts algorithm on CUDA. Our primary focus is on implementing Graph Cuts on grid graphs, which are extensively used in imaging applications. We first explain our implementation of breadth first search (BFS) graph traversal on CUDA, which is extensively used in our Graph Cuts implementation. We then present a basic implementation of Graph Cuts that succeeds to achieve absolute and relative speedups when used for foreground-background segmentation on synthesized images. Finally, we introduce two optimizations that utilize the special structure of grid graphs. The first one is lockstep BFS, which is used to reduce the overhead of BFS traversals. The second is cache emulation, which is a general technique to regularize memory access patterns and hence enhance memory access throughput. We experimentally show how each of the two optimizations can enhance the performance of the basic implementation on the image segmentation application. ER -