Maryland CPU-GPU Cluster Infrastructure

Facilities and Infrastructure

We have built a high-performance computing and visualization cluster that takes advantage of the synergies afforded by coupling central processing units (CPUs), graphics processing units (GPUs), displays, and storage.

Hardware

CPU-GPU Cluster Infrastructure

The cluster consists of the following hardware configuration:

CPU-GPU Cluster Rack

Each node consists of:

Software

We have installed several software packages on the pilot cluster. In anticipation of the larger cluster, our work on the operating system has focused on developing an automated installation of the customized components and automated mechanisms for updating them across a much larger cluster. We have configured and tested a number of software packages including: BrookGPU, NVIDIA Cg, the NVIDIA SDK, and the NVIDIA Scene Graph SDK, GNU Compiler Collection, NAGware FORTRAN Compiler, the Fastest Fourier Transform in the West (FFTW), and Matlab. We have integrated our CPU-GPU cluster with our standard resource allocation mechanisms: the Portable Batch System, the Maui Scheduler, and Condor. Users can request immediate allocations for batch or interactive jobs or they can reserve nodes in advance of a demo. Unused nodes are donated to the condor pool and may be used by any researcher in the Institute. We have experimented with several software packages to distribute rendering of the tiled display, including Chromium, DMX and OpenSG. For scientific visualization tasks, we are utilizing Paraview and custom applications built with Kitware's Visualization ToolKit (VTK).

Visualization on the LCD Display Wall (LLNL Dataset)

We have deployed an infrastructure for interacting with the cluster using standard desktops, cluster-connected workstations, and large-scale display technologies. We have installed the cluster nodes and cluster-connected workstations in a secure data center and their displays and terminals in the graphics lab and a neighboring public lab. We are using digital KVM as a practical technology for interacting with the CPU-GPU nodes from a data center console or for remote IP clients. Although this technology does not support high resolution video or suitable refresh rates, it is invaluable for troubleshooting and managing the nodes. We are also evaluating tools to provide software-based remote display access based on VNC and the HP Remote Workstation Solutions. These solutions allow remote users to monitor the cluster displays at reduced frame rates over local area networks. This is a key tool for developers, who may wish to work from their own desk while interacting with the cluster.