TwitterStand: Separating the Wheat from the Chaff in Breaking News

Twitter is an electronic medium that allows a large user populace to communicate with each other simultaneously. Inherent to Twitter is an asymmetrical relationship between friends and followers thereby provides an interesting social network-like structure among the users of Twitter. Twitter messages, called tweets, are restricted to 140 characters and thus are usually very focused. Twitter is becoming the medium of choice for keeping abreast of rapidly breaking news. This project explores the use of Twitter to build a news processing system from Twitter tweets.

Principal Investigators

Better Network Modules: New Tools for Protein Network Analysis

The University of Maryland College Park is awarded a grant to develop new algorithms and a suite of software tools based on a general and flexible definition of a ""network module"" in order to extract meaningful biological clusters from noisy and incomplete protein-protein interaction data. Recently developed high-throughput techniques are being used to sample protein-protein interactions from many organisms and are creating a wealth of data that must be analyzed computationally.

Principal Investigators

Robust Image Matching with Deformation

This project is to develop new, effective distance metrics for comparing two images. These metrics account for two effects. First, pixels can change their position, deforming from one image to another. Second, pixels may change their intensity. In many vision problems, intensity changes are primarily due to lighting variation. The research team first addresses the effect of illumination changes, which enables to develop a new, powerful, robust distance for measuring the effects of lighting variation in an image.

Principal Investigators

Scalable Geometric & High Dimensional Algorithms

The steadily decreasing cost of commodity computing as well as the democratization of the increasing thirst for computing power, fueled largely by the massive volumes of data generated by powerful search engines, has led to a reexamination of how the traditional methods of solving problems that involve search. As more and more applications become web services (i.e., ``cloud applications'' in technology parlance), there is an increasing expectation that answers be obtained and provided in real time and at a scale that is capable of servicing millions of users at the same time.

Principal Investigators

Algorithms for the Analysis of Data from Massively-parallel Genome Sequencing

New generation DNA sequencing technologies are revolutionizing modern biological research. Scientists can now generate the rough equivalent of an entire human genome (~3 billion base-pairs of DNA) in just a few days with one single sequencing instrument. Until recently, such amounts of data could only be generated at large genome centers using hundreds of sequencers. The analysis of these data is complicated by their size - a single run of a sequencing instrument yields terabytes of information, often requiring a significant scale-up of the existing computational infrastructure.

Principal Investigators

Similarity Criteria issues in Similarity Retrieval

The wide use of the internet coupled with the steadily decreasing cost in computing and storage has led to an expansion of the data that users expect to retrieve from simple numeric and alphanumeric, to include images, audio, video, where the retrieval criterion is one of similarity. An inherent difficulty with similarity retrieval is deciding on a criterion for the similarity.

Intellectual Merit

This proposal explores issues involved in retrieval that is based on several criteria of similarity:

Principal Investigators

GRID, Public and GPU Computing

The University of Maryland College Park is awarded a grant to create tools that expand the computing power freely available to all ATOL (Assembling the Tree of Life), and other phylogenetic researchers. The central idea is to combine Grid and GPU (graphics processing unit) computing to take better advantage of a diversity of computing resources, particularly existing desktop processing capacity available through public-computing.

Principal Investigators

Graphs to Diversity: Extracting Genomic Variation from Sequence Graphs

Recent advances in genome sequencing technologies have enabled the sequencing of bacteria directly from the environment, providing a broader outlook on the diversity of bacteria than ever before possible. Recent studies of environmental samples have revealed complex communities containing many previously unknown species, and uncovered a large amount of genetic variation and diversity even among closely related strains.

Principal Investigators

Pages

Subscribe to UMIACS RSS