The CLIP Colloquium Series presents...


Nora, MONK, and Naive Bayes Pattern Recognition: What Literary Scholars Do With Bags of Words

Matt Kirschenbaum and Catherine Plaisant (University of Maryland)
Wednesday, November 29, 2006, 11:00am, AVW 2120

This talk will discuss two research projects that represent collaborations between the Maryland Institute for Technology in the Humanities (MITH) and the Human Computer Interaction Lab, both as part of larger multi-institutional efforts: Nora, whose work is rapidly concluding, and MONK, whose work is soon beginning.

Over the last decade, many millions of dollars have been invested in creating digital library collections: at this point, terabytes of full-text humanities resources are publicly available on the Web. Nora was a two-year multi-institutional project funded by the Andrew W. Mellon Foundation (PI: John Unsworth, University of Illinois) with the goal of producing software for discovering, visualizing, and exploring significant patterns across large collections of full-text humanities resources in existing digital libraries. Locally MITH and HCIL have been working closely with humanities scholars to assess what, if any, use text mining is in humanities research, and how usable systems can be designed to support text mining on behalf of non-expert users. We will discuss results and demonstrate prototypes.

MONK (Metadata Offer New Knowledge) represents the next phase of this research, and will work towards the intermediary objective of a corpus composed of 1000 novels drawn from 18th and 19th British and American literature. The objective here is to combine token-level analysis with the rich descriptive metadata characteristic of humanities text collections.

Both Nora and MONK share the common assumption that what matters in humanities research is less often ground truth and actionable outcomes than provocation and argumentation. We will therefore discuss "provocation" as an interpretive framework for text mining operations.

About the Speakers

Matt Kirschenbaum is an assistant Professor of English and Associate Director of MITH at the University of Maryland.

Catherine Plaisant is Associate Research Scientist and Associate Director of the Human-Computer Interaction Laboratory at the University of Maryland.


This talk is part of the CLIP Colloquium Series, organized by Jimmy Lin (jimmylin -at- umd .dot. edu). For the complete schedule, please visit http://www.umiacs.umd.edu/research/CLIP/colloq/.