UMD

The University of Maryland and IBM Partnership
Current Projects

IBM

Project 1) Dynamic Instrumentation and Cache Measurements Using IBM Power4 SystemsCluster Project

UMD PI: Jeff Hollingsworth, with participating faculty: Pugh


Project 2) Exploration of IBM T221 hi-res display by HCIL

UMD PI: Ben Bederson; with participating faculty: Shneiderman.
IBM Collaborators: Kai Schleupen and John Karat, IBM Research.

Project 3) MALACH: Multilingual access to large spoken archives

PI: Doug Oard; with participating faculty: Dorr, Resnik, and Doermann.
IBM Collaborators: Michael Picheny, Bhuvana Ramabhadran, Todd Ward, Martin Franz.


Project 1) Dynamic Instrumentation and Cache Measurements Using IBM Power4 Systems

This proposal outlines collaboration with IBM Research, the Advanced Computing Technology Center (ACTC), the High Performance System Software group at the University of Maryland and the Paradyn Parallel Performance Tools group at the University of Wisconsin. The main research topics that will be investigated in this project include:
           1. Memory system performance
           2. Memory model for Java
           3. Dynamic instrumentation support for OpenMP.

Currently, all three principal investigators have research support for students and staff to support the proposed work. However, to support this collaboration, we propose that IBM provide one IBM p670 8-way system for the University of Maryland, to allow the continuation of research efforts on IBM systems. These systems will provide access to resources for the project teams at both sites.

Memory System Performance (Jeff Hollingsworth, UMD)

Traditionally, performance data is tracked by associating it with control structures. Times and counts are displayed per function, loop, block, or line. Our experience with memory profiling has shown that memory data is better tracked by data structure. Profiling the total time for operating on a particular array, structure, or cache line can provide critical insights into memory performance. Combining these data structure views of performance with hardware counters (such as caches misses, memory access counts, etc.) will provide a powerful handle to application performance.
The key issue with correlating memory behavior with specific data structures is how to efficiently collect this information. Our initial approach has been to use software instrumentation to instrument memory related operations. While this approach provided accurate information about the memory behavior, the overhead (10-100x slowdown of programs) was too high to use for large programs. Recently we have been concentrating on using sampling based techniques to reduce this overhead. We have also been evaluating the potential of various extensions to current hardware instrumentation. In particular, we have found that by extending current hardware counters to allow measuring cache misses within a selectable region of memory, we can accurately isolate performance problems to specific data structures.
The hardware counters available on the Power 4 system are capable of providing the type of interrupt (plus affective address) information required to gather data structure specific performance data. However, the current device driver for the performance counters does not allow this type of data to be gathered. We would use the equipment requested as part of this SUR grant to develop a prototype implementation of this functionality. We plan to use the LPAR feature of the Power4 systems to have an AIX and a Linux partition, such that we can develop and test this prototype implementation running on both operating systems on a single node.
A second area of interest in memory system performance has to do with gathering data to understand the fine-grained interactions between application programs and the memory hierarchy. While Prof. Hollingsworth was on sabbatical at IBM research, he started work on the Simulation Infrastructure to Guide Memory Analysis (SIGMA), along with Luiz DeRose and Kattamuri Ekanadham. The initial approach uses run-time instrumentation to extract a detailed representation of the memory reference pattern of the application. The data is gathered using machine-specific static instrumentation inserted into the assembly language representation of the program. We propose to change the instrumentation system to use dyninst to allow both dynamic selection of loops to instrument as well as platform independence. This effort will require extending Dyninst to allow it to instrument load and store operations. Currently, Sigma works with sequential programs and MPI programs. In conjunction with the OpenMP support for dyninst and Paradyn (described bellow), we propose to extend SIGMA to work with OpenMP programs. Currently we support dyninst under PowerPC/AIX and X86/ Linux. In this work we will investigate porting dyninst to Power/Linux.

Java Memory Model (Bill Pugh, UMD)

Java has integrated multi-threading to a far greater extent than most programming languages. It is also one of the only languages that specifies and requires safety guarantees for improperly synchronized programs. It turns out that understanding these issues is far more subtle and difficult than was previously thought. The existing specification makes guarantees that prohibit standard and proposed compiler optimizations; it also omits guarantees that are necessary for safe execution of much existing code. Some guarantees that are made (e.g., type safety) raise tricky implementation issues when running unsynchronized code on SMPs with weak memory models.
As part of devising a replacement memory model for Java, we need to investigate the implementation costs and benefits gained from different variations in the model, and work out possible implementation strategies. This system will allow us to conduct our experiments on Power4 SMP's and ensure that the performance issues for Power4 architectures are considered.

OpenMP Support (Bart Miller, U. Wisconsin)

OpenMP is becoming an increasing important programming model for parallel programs. It provides a portable interface to the shared-memory idiom, while providing many of the loop-level parallelism primitives. It is crucial to provide OpenMP support for the wide variety of tools that are being built and will be built on Dyninst. Dyninst support for OpenMP consists of instrumentation of three major components: threading, OpenMP control constructs, and synchronization primitives. There is currently limited support for threaded applications in Paradyn, and this is an area of active development (threading is only supported on the SPARC/Solaris platform, with current porting underway on AIX). In addition, Paradyn is the only current user of this technology. Per thread instrumentation must be added to the Dyninst API.
The OpenMP control constructs (such as PARALLEL, DO, and SECTIONS) also need to be supported. This support requires that Dyninst identify the OpenMP objects, such as loops and parallel sections. Identification requires that we understand the run-time code and library structure of the OpenMP system. We will pick two standard reference versions of OpenMP and initially provide support for these. After identification, these constructs must be made visible in the interface. The current Dyninst code objects include module, function, and basic block. These new objects will be made visible, nameable, and instrumentable in the extended Dyninst API.
Instrumenting OpenMP synchronization primitives (such as BARRIER or CRITICAL) is a natural extension of the kind of synchronization operations that we already handle. Access to these operations is typically done through a well-defined library interface. This type of design provides a natural point of access for dynamic instrumentation.

To top of page


Project 2) Exploration of IBM T221 hi-res display by HCIL

Ben Bederson and Ben Schneiderman, UMD
Kai Schleupen and John Karat, IBM Research

We request that IBM provide the University of Maryland's Human-Computer Interaction Lab with an IBM T221 hi-res display (and driver cards) through the IBM SUR program to support our research in information visualization, novel interaction techniques, and human perception.

Our lab has a long history of creating novel and effective visualization and query systems that help people to find and understand information faster than with traditional interfaces. We have so far relied on commercially available ~100 DPI displays. These
displays limit our ability to take full advantage of the human visual system since the human eye and visual system is capable of perceiving much higher density information. We would like to understand our how visualization techniques can be improved when adapted to high resolution displays, such as IBM's T221 ~200 DPI display.

We currently are working on a number of visualization projects that could take advantage of such a display. Two such projects are:

PhotoMesa
http://www.cs.umd.edu/hcil/photomesa

PhotoMesa is a zoomable image browser that allows users to see hundreds or thousands of images simultaneously. We expect that higher resolution displays would increase the number of images that a person could effectively browse on a single display. With the T221 display, we would run experiments to understand how image search and browsing performance changes based on the display quality.

Fisheye distortion
http://www.cs.umd.edu/hcil/fisheyemenu
http://www.cs.umd.edu/hcil/fishcal

We have applied "fisheye distortion" techniques to linear lists for menu selection (Fisheye Menu) and to grids for calendar access (FishCal), and have shown that these techniques can improve user performance and satisfaction on a range of tasks. However, this technique works by shrinking down peripheral information so it is very small and takes up minimal space on the display. With tradition low-resolution displays, these tiny representations are difficult or impossible for users to comprehend. We expect that a high resolution display would improve the ability of users to comprehend the tiny representations of information inherent in a fisheye display. With a T221 display, we would be able to run experiments to understand if high resolution does in fact improve users' performance of fisheye interfaces We are aware that due to the high resolution of the T221 display, performance of rapidly changing animated displays is limited. However, as those technical problems are being solved, we can still get started now on running experiments, and designing our software for the high resolution display. We can do this because animation is only one aspect of our interfaces. We can start by running experiments on less dynamic displays of information to understand how the ability of people to perceive small information affects their overall understanding of the information display.

In general, we would like to explore how visualization techniques can be adapted to take advantage of the high resolution displays since they will clearly be very common in the future. We expect that our research would take a number of forms including:

  • Modifying existing or building new visualization and interaction techniques to work with the T221 display.
  • Perform quantitative empirical studies to understand the effectiveness of these techniques
  • Perform quantitative empirical studies to develop a better understanding of basic human perceptual performance, and how it depends on display quality.

The Human-Computer Interaction Lab (HCIL) at the University of Maryland has a mission to design, implement, and evaluate new interface technologies that are useable, useful, and appealing to a broad cross-section of people. We believe it is critical to understand how the needs and dreams of people can be reflected in our future
technologies. To this end, the HCIL develops advanced user interfaces and design methodology. Our primary activities include collaborative research, publication and the sponsorship of open houses, workshops and symposiums.

The HCIL is an interdisciplinary lab comprised of faculty and students from Computer Science, Education, Psychology and Information Studies. Our current work includes new approaches to information visualization, interfaces for digital libraries, multimedia resources for learning communities, zoomable user interfaces (ZUIs), technology
design methods with and for children, and instruments for evaluating user interface technologies.

Organizationally, the HCIL is part of the University of Maryland Institute for Advanced Computer Studies (UMIACS), an interdisciplinary institute whose goal is to foster research combining computer science and other fields.

To top of page

Project 3) MALACH: Multilingual access to large spoken archives

IBM Research stakeholders (T.J. Waston, Human Language Technologies)
Michael Picheny, Bhuvana Ramabhadran, Todd Ward, Martin Franz

University of Maryland (UMIACS)
Douglas Oard, Bonnie Dorr, Philip Resnik, David Doermann

IBM Research and the University of Maryland are partnered with the Survivors of the Shoah Visual History Foundation (VHF) and Johns Hopkins University to develop technology to improve access to large collections of conversational speech. The National Science Foundation is funding personnel and travel ($7.5 million over 5 years), and the VHF is providing access to the world's largest coherent collection of digitized audio/video interviews (116,000 hours). Maryland and IBM Research are working together to develop interactive search technology for these challenging materials. Rights management agreements with VHF require that all project data be stored on devices that are isolated from the public networks.

MALACH is a very high-profile project, funded by the NSF last year, to transcribe, index, and provide user access to a large quantity of transcription from survivors, liberators and rescuers of the Holocaust. In particular, our plan is to capitalize on the unique characteristics of the Survivors of the Shoah Visual History Foundation's (VHF) multimedia digital archive (180 Terra Bytes) of oral histories, consisting of over 116,000 hours of interviews with 52,000 survivors, liberators, rescuers and witnesses of the Holocaust, recorded in 32 languages. The end result would be the
1) Ability to do automated transcription of emotional and accented speech in multiple languages in the presence of whispering and uncued language switching with a quality high enough to support metadata generation and free-text search;
2) Semiautomatic techniques for mapping thesaurus categories across languages to create multilingual thesauri.
3) Enhanced searching capability with an intuitive end-user interface for search and exploration, including event timelines to support navigation within and across interviews and cross-language retrieval (search in one language to retrieve interviews in many languages).
4) User needs analysis to support all of the above.

University of Maryland is specifically involved in items (3) and (4) needs this equipment to accelerate user studies and to provide us with better guidance on what areas are weakest in the transcription and indexing areas for our own research. This project is expected to provide a quantum leap in speech recognition and information retrieval technologies, all of which are major focus areas of IBM.

To top of page

Progress Reports Previous Projects Current Projects