Data Intensive Computing

Overview

UMIACS develops and supports Data Intensive Computing systems with approximately one petabyte of persistent storage. Each lab has unique requirements, so the Institute's storage systems are heterogenous. They are built on a variety of data storage components and they employ many different storage models, including:

  • Storage Area Networks built on Engenio, 3par, Compellent, and Data Direct Networks systems to support File Servers, Relational Database Management Systems, Network Attached Storage Gateways and VMware.
  • Large arrays of Direct-Attached Disks using Nexenta ZFS and Open Stack.
  • Parallel File Systems based on GPFS and Lustre.
  • Distributed File Systems based on Gluster.
  • Map-Reduce Systems based on Cloudera Hadoop.
  • Data Grids based on the Storage Resource Broker (SRB) and Integrated Rule-Oriented Data System (iRODS).
  • Tape-based Storage in the Tivoli Storage Manager.

Selected Facilities

GemBox

A three hundred terabyte storage cloud built to facilitate the exchange of biological and medical data and imagery for the Center for Bioinformatics and Computational Biology (CBCB). The system runs the Gluster File system and RedHat Enterprise Linux on a cluster of Dell storage servers equipped with a large array of commodity SAS near-line disks, and LSI Raid-on-Chip disk controllers. User access is controlled by AjaXplorer.

Bespin

An eighty terabyte Hadoop system built to support the Cloud Computing Center (CCC), the Laboratory for Computational Linguistics and Information Processing (CLIP), and the Language and Media Processing Laboratory (LAMP). The system is built on SuperMicro Twin2 servers running RedHat Enterprise Linux and Cloudera Hadoop.