TREX & ACE

The RDF Extractor & Automatic Coding Engine

Motivation

With the enormous amount of textual information now available online, there is an increasing demand – especially in the national security community – for tools capable of automatically extracting certain types of information from massive amounts of raw text data.

For example, US forces in Afghanistan are continuously learning about the tribes on the Pakistan-Afghanistan border. Given one of these tribes (e.g., the Afridi), they would like to learn the answers to questions such as: What is the source of economic support for the tribe ? Which other tribes have they had conflicts with, and over what issues did these conflicts arise ? Have they been involved in violence against the US ? Against other tribes in the region ? Alternatively, if we wish to have a real-time “violence watch” around the world, we may need to define what constitutes a violent event, and what types of attributes about a violent event are of interest. For instance, we may wish to identify the victims, number of dead, number of injured, perpetrators, location, time, and weapon used.

TREX

TREX is a generic, domain-independent Information Extraction system. Unlike many traditional approaches, TREX does not rely on domain-specific knowledge or site-specific features, but rather takes as input a schema describing the information to be extracted.

Once the system has been provided with a training corpus of annotated sentences, extraction rules are derived from the training corpus, and then applied to extract information from previously unseen documents.

The information to be extracted is described using RDF Schema and the output of the Extraction Engine is in RDF format, i.e. (subject, property, object) triples. RDF (Resource Description Framework) is a web standard defined by the World Wide Web Consortium.

Automatic Coding Engine

There are applications where the types of questions that need to be answered are far more complex than those that TREX can handle. As an example, several groups of political scientists around the world monitor political organizations and/or conflicts. This task, which is currently performed manually by human coders in most such efforts, is extremely time consuming, thus the need for automation is enormous. ACE is developing the prototype system and algorithms to perform such extractions automatically and in real-time from massive amounts of data.

Project lead: Prof. V.S. Subrahmanian

Points of Contact: Prof. V.S. Subrahmanian , Dr. Massimiliano Albanese

Last updated: November 20, 2009.

Project Contributors

Special Thanks

Publications

This page will be updated as our work enters print. For information about receiving draft publications, technical reports, and conference presentations, please do not hesitate to contact team members.

The following sections may include links to restricted access material. Please do not hesitate to contact a group member for instructions regarding how to obtain a username and password.

Presentations

The following sections may include links to restricted access material. Please do not hesitate to contact a group member for instructions regarding how to obtain a username and password.

Live Demonstration