“Entity Linking for Conversational Content”

Thu Jan 28, 2016 11:00 AM

Location: LTS Auditorium, 8080 Greenmead Drive

Doug Oard
College of Information Studies and UMIACS

Entity linking seeks to determine which known entity is referred to by each mention found in some use of human language. When the referent entity is not known, an entity linking system should recognize such cases (NIL detection) and determine which are co-referent (NIL clustering). The relatively low entropy that is typical of many uses of human language makes this a fairly easy problem, with accuracies routinely exceeding 90 percent.

A key limitation of that result is that most of the work done on this problem has focused on content that was created with dissemination in mind. What we might call “conversational content”—content in which the parties to an interaction are known to each other and draw on shared context that may not contemporaneously be made explicit—poses new challenges. Early experiments with email indicate that features derived from the communication graph are important for recovering the context needed to make good links. But that work has two key limitations. However, the vast majority of conversational content is spoken, which introduces further challenges.

In this talk, I will review the work we have done on linking person entities in two email collections. I’ll then draw on that to explain how we plan to extend that work to other entity types. Finally, I’ll describe our current work with a freely redistributable test collection that contains both email and conversational telephone speech.

Doug Oard is a professor at UMD with joint appointments in the College of Information Studies (Maryland’s iSchool) and the University of Maryland Institute for Advanced Computer Studies (UMIACS).

His research interests center around the use of emerging technologies to support information seeking by end users.

Oard earned his doctorate in electrical engineering from UMD.