Current Projects
![]() |
We are engaged in an ongoing effort to explore cloud computing, particularly as it relates to massive data analytics with platforms such as MapReduce. Highlights of our effort include:
|
![]() |
Ranking cascades provide a new approach to text retrieval, where document ranking is broken into a finite number of distinct stages. Each stage considers successively richer and more complex features, but over successively smaller candidate document sets—in other words, retrieval is viewed as a multi-stage progressive refinement problem. See this project page for more details. |
![]() |
Recently, we have begun a project to explore highly-scalable MapReduce algorithms for linguistic modeling within a Bayesian framework, making use of variational inference to achieve a high degree of parallelization on web-scale datasets. See this NSF project page for more details. |
![]() |
Ivory is a Hadoop toolkit for distributed text retrieval that features a retrieval engine based on Markov Random Fields. The project is focused on the challenges of indexing and retrieval algorithms at web scale. See this NSF project page for more details. |
![]() |
In an ongoing effort, we have been exploring the intersection of large-scale text retrieval and statistical machine translation. One thread has been scaling up iterative machine learning algorithms to larger and larger dataset. Another thread has been the application of IR techniques to automatically extract bilingual training data. See this project page for a completed project from the Google/IBM Academic Cloud Computing Initiative and NSF's CLuE program. |









