Carpuat Receives NSF CAREER Award to Improve Multilingual Text Analysis

Fri Jan 12, 2018

A University of Maryland expert in computational linguistics has won a National Science Foundation (NSF) Faculty Early Career Development (CAREER) award for a project designed to improve multilingual text analysis.

Marine Carpuat, an assistant professor of computer science in the Computational Linguistics and Information Processing (CLIP) Laboratory, is principal investigator of the NSF award, which is expected to total $550,000 over five years.

The funding supports efforts by Carpuat to develop computational representations and methods that compare and contrast the meaning of text in different languages. The goal is to advance new technologies that better support cross-lingual communication and cross-cultural understanding, says Carpuat, who has an appointment in the University of Maryland Institute for Advanced Computer Studies (UMIACS).

Cross-lingual work in natural language processing currently relies on the assumption that a source text and its translation are equivalent in meaning within the two languages, and that they can be decomposed into smaller equivalent units by aligning sentences, phrases and words.

What often happens, Carpuat says, is that content conveyed in two languages is rarely exactly equivalent—the same topics or events can be discussed from widely different perspectives. Even faithful translations can be hard to understand without the appropriate linguistic and cultural background knowledge, she adds.

“We want to build upon distinct bodies of our previous work involving machine translation and semantic analysis,” Carpuat says. “We anticipate that the CAREER project will produce novel techniques to detect and explain nuanced differences between words and sentences in different languages.”

Carpuat says the CAREER project will integrate research with education by using activities motivated by the practical problem of translating Wikipedia pages in order to illustrate the challenges of language technology developed on inevitably biased data. These activities will target high-school and undergraduate students outside of computer science, Carpuat says, as well as computer scientists of diverse backgrounds at the undergraduate and graduate level.

“This innovative research by Marine Carpuat will ultimately have an impact across a wide range of people that rely upon the accurate translation of languages, including second language learners, volunteer translators, and security analysts,” says Amitabh Varshney, professor of computer science and director of UMIACS.

***

CAREER: Semantic Divergences Across the Language Barrier” is supported by NSF grant #1750695 from the NSF’s Division of Information and Intelligent Systems.

PI: Marine Carpuat, assistant professor of computer science with an appointment in the University of Maryland Institute for Advanced Computer Studies (UMIACS).

About the CAREER award: The Faculty Early Career Development (CAREER) Program is an NSF activity that offers the foundation’s most prestigious awards in support of junior faculty who exemplify the role of teacher-scholars through outstanding research, excellent education and the integration of education and research within the context of the mission of their organization.

About CLIP: The Computational Linguistics and Information Processing (CLIP) Laboratory at the University of Maryland is engaged in designing algorithms and building systems that allow computers to effectively and efficiently perform language-related tasks. CLIP is one of 14 labs and centers in UMIACS.