The Association for Machine Translation in the Americas
AMTA-2002 Conference
Interlingua Reliability Workshop
Interlingua Coding Experiment


Computational Linguistics and
Information Processing Lab
University of Maryland College Park

Introduction

This document contains the necessary information for participating in the Interlingua Coding Experiment section of the Interlingua Reliability Workshop. This document contains a specification of an interlingual language, examples of annotated sentences, and 20 parsed sentences to annotate. During the workshop the participants will discuss issues of annotation differences, measures of inter-annotator agreement, etc.

NOTE: We ask that participants conform to the specifications below, rather than applying a different framework. If you would like to apply your own framework, please create two submissions, one that is compatible with the specifications below, and one that conforms to your own framework.

The Interlingual Representation

The interlingual representation used here is simple dependency-tree structures with paired syntactic-thematic-role links. This representation includes both syntactic and thematic information such as parts of speech and syntactic roles in addition to thematic roles. This information can eventually be used to model the syntactic-semantic interface. A detailed description of the structure of this Interlingua is described later in this document. To get a feeling for the representation, here is a simple example for the sentence John broke the old vase:
EXAMPLE 1:
<doc doc_id="EX-1" sys_id="Nizar">
<tree>
  1               john      PN     2     sbj:agt  [ word:John ]
  2              break       V     *       *:*    [ word:broke ]
  3                the       D     5     mod:mod  [ word:the ]
  4                old      AJ     5     mod:mod  [ word:old ]
  5               vase       N     2     obj:thm  [ word:vase ]
  6                  .      PX     2     mod:nil  [ word:. ]
</tree>
</doc>
The sentence, John broke the old vase, is specified as document "EX-1" with "Nizar" as the ID of the "system" that annotated it. The tree representing the sentence is enclosed between the tags <tree> and </tree>. The left-most column indicates the tree node number, which corresponds to the surface order of the word represented by the node. The second and third columns indicate the word lexeme and its part of speech respectively. The fourth column indicates the parent node of the node specified in a give row. For example, the parent, or dominating node, of John(1) and vase(5) is break(2). The fifth column indicates the syntactic-plus-thematic relationship of the node in a given row to its parent. For example, John(1) is syntactically the subject (sbj) and thematically the agent (agt) of the verb break(2). Similarly, vase(5) is syntactically the object (obj) and thematically the theme (thm) of the verb break(2). If the sentence was The vase broke, vase would still be the theme although it is syntactically a subject. This representation pays less attention to detailing the semantics of words other than arguments and modifiers of verbs. For example, in the sentence above, old is marked as a syntactic modifier and thematic modifier (versus a more detailed description such as age). Punctuation in this sentence is considered a non-contributor semantically, so it receives the thematic role nil. The top node of the dependency tree (i.e. its root) is indicated by *:*. The last column indicates features of the different nodes. Here, the surface word is included as a feature.

Representation Definition

Document Structure

Every sentence is considered a document with a specific doc_id. Different analyses of the sentences are distinguished by their sys_id. A dependency tree is represented as a list of node information contained between the tags <tree> and </tree>. The following is a BNF description of the format of the sentences:

<DOC> ::= <doc doc_id="<doc-id>" sys_id="<sys_id>">
          <TREE>
          </doc>
<TREE> ::= 
  <tree>
  {<NODE-NUMBER> <lexeme> <POS> <PARENT> <SYN-ROLE>:<THETA-ROLE> <FEATURES>}*
  </tree>

<NODE-NUMBER> ::= <num> || "*"
  # roots are marked by "*"

<POS> ::= AJ || AV || AX || C || D || N || Num || P || PN || PX || V 
  # AJ Adjective
  # AV Adverb
  # AX Auxiliary
  # C Conjunction
  # D Article/Pronoun
  # N Common Noun
  # Num Numeral
  # P Preposition
  # PN Proper Noun
  # PX Punctuation
  # V Verb

<PARENT> ::= <num> || "*"
  # roots are marked by "*"

<SYN-ROLE> ::= * || sbj || obj || iob || prd || mod
  # *   Root
  # sbj Subject
  # obj Object
  # iob Indirect Object
  # prd Predicate
  # mod Modifier

<THETA-ROLE> ::=   * || AGT || INS || EXP || THM || PRC ||
                 PRD || RSL || SRC || GOL || LOC || PTH || 
                 TMP || BEN || PRP || MOD || MNR || DGR || 
                 NIL || NON || 
                 AGT+ || INS+ || EXP+ || THM+ || PRC+ || 
                 PRD+ || RSL+ || SRC+ || GOL+ || LOC+ || 
                 PTH+ || TMP+ || BEN+ || PRP+ || MOD+ || 
                 MNR+ || DGR+ || NIL+ || NON+ ||

  # Described in the next section -- Thematic Roles 

<FEATURES> ::= [ {<key>:<value>}* ] 

Comments can be included anywhere in a document. A comment should be preceded by a percent sign (%) AND should be on a separate line.

Thematic Roles

The following table lists the thematic roles used in this representation.

Note: There is no distinction between arguments and modifiers or what is obligatory or optional in a subcategorization frame.
RoleDefinitionExamples
* Root: Indicates the root of the tree.
  • John broke the vase.
AGT Agent: An agent can be defined using Dowty's criteria for proto-agency as the highest proto-agent ever. Thus an Agent should have the features of volition, sentience, causation and independent existence.
  • John broke the vase. (Dorr - Agent)
  • Henry pushed the door open and went in. (Gildea - Agent)
  • Henry pushed/broke the door. (VerbNet - Agent)
INS Instrument: An instrument should have causation but no volition. Its sentience and existence are not relevant. Typically, an instrument appears with an agent (present or deleted) and can be paraphrased with "using".
  • The Hammer broke the vase. (Dorr - Agent)
  • She hit him with a baseball bat.(Dorr - Instrument)
  • He bought it for five dollars. (Dorr - Possessed Modifier)
  • In the children with colonic contractions fasting motility did not differentiate children with and without constipation. (Gildea - Instrument)
  • If this is the case can it be substantiated by evidence from the history of developed societies (Gildea - Force)
  • Jeez, that amazes me as well as riles me. (Gildea - Cause)
  • The Hammer broke the vase. (VerbNet - Instrument)
  • She hit him with a baseball bat.(VerbNet - Instrument)
  • He bought it for five dollars. (VerbNet - Asset)
  • His death saddened us. (VerbNet - Cause)
EXP Experiencer: An experiencer has no causation but is sentient and exists independently. Typically an experiencer is the subject of verbs like feel, hear, see, sense, smell, taste, notice, discern, detect, glimpse, listen, regard, seek, scrutinize, etc.
  • John heard the vase shatter. (Dorr - Experiencer)
  • It may even have been that John anticipating his imminent doom ratified some such arrangement perhaps in the ceremony at the Jordan. (Gildea - Experiencer)
  • John shivered. (VerbNet - Experiencer)
THM Theme: The theme is typically causally affected or experiences a movement and/or change in state (thus more proto-patient-like in Dowty's terms although its existence is independent). The theme can appear as the information in verbs like acquire, learn, memorize, read, study, etc. and as the possessed entity in verbs own, have, possess, fit, carry, sleep. It can also be a thing, event or state (clausal complement).
  • John went to school. (Dorr - Theme)
  • John broke the vase. (Dorr - Theme)
  • John memorized his lines. (Dorr - Information)
  • She buttered the bread with margarine. (Dorr - Instrument)
  • John wanted to go home. (Dorr - Prop)
  • John has five bucks. (Dorr - Possessed)
  • This box carries five eggs. (Dorr - Possessed)
  • This cabin sleeps five people. (Dorr - Possessed)
  • He loaded the cart with hay. (Dorr - Possessed Modifier)
  • He robbed him of his rights. (Dorr - Possessed Modifier)
  • He said, ``We would urge people to be aware and be alert with fireworks because your fun might be someone else's tragedy.'' (Gildea - Topic)
  • As soon as a character lays a hand on this item, the skeletal Cleric grips it more tightly. (Gildea - Patient)
  • It says that rotation of partners does not demonstrate independence. (Gildea - Proposition)
  • John went to school. (VerbNet - Theme)
  • John broke the vase. (VerbNet - Patient)
  • She buttered the bread with margarine. (VerbNet - Theme)
  • They talked about Mary. (VerbNet - Topic)
  • John and Mary argued for hours. (VerbNet - Actor)
PRC Perceived: Refers to a perceived entity that isn't required by the verb but further characterizes the situation. The perceived is neither causally affected nor causative. It doesn't experience a movement or change in state. Its volition and sentience are irrelevant. Its existence is independent of an experiencer. PRC is often paired with EXP.
  • He saw the play. (Dorr - Perceived)
  • He looked into the room. (Dorr - Perceived)
  • The cat's fur feels good to John. (Dorr - perceptual modifier)
  • She imagined the movie to be loud. (Dorr - MOD-PROP)
  • What is apparent is that this manual is aimed at the non-specialist technician possibly an embalmer who has good knowledge of some medical procedures. (Gildea - Percept)
  • Rex spied out Sam Maggott hollering at all and sundry and making good use of his over-sized red gingham handkerchief. (Gildea - State)
PRD Predicate: Indicates new modifying information about other thematic roles. Typically, it is the predicate of verbs like be, become, consider (a fool), pronounce (dead), presume (happy), etc.
  • We considered him a fool. (Dorr - Pred)
  • We pronounced him dead. (Dorr - Pred)
  • She acted happy. (Dorr - Pred)
  • The nation elected him president. (Dorr - Mod-pred)
  • They worshiped him as their leader. (Dorr - Mod-pred)
  • She imagined him as a prince. (Dorr - Mod-pred)
RSL Result: Indicates the thing/event resulting from the verb's occurrence. This thematic role's existence is fully dependent.
  • John turned into a monkey. (Dorr - Goal)
  • She wiped the floor clean. (Dorr - Pred)
  • All the arrangements for stay-behind agents in north-west Europe collapsed, but Dansey was able to charm most of the governments in exile in London into recruiting spies. (Gildea - Result)
  • John turned into a monkey. (VerbNet - Product (turn class))
  • She wiped the floor clean. (VerbNet - Oblique[+state])
SRC Source: Indicates where/when the theme started in its motion, or what its original state was, or where its original (possibly abstract) location/time was.
  • John left the house. (Dorr - Source)
  • John ran away from home. (Dorr - Source)
  • John slept from 5pm until 10pm. (Dorr - temporal Source)
  • He heard the sound of liquid slurping in a metal container as Farrell approached him from behind. (Gildea - Source)
  • John left the house. (VerbNet - Location/Source)
GOL Goal: Indicates where the theme ends up in its motion, or what its final state is, or where/when its final (possibly abstract) location/time is.
  • John ran home. (Dorr - Goal)
  • John ran to the store. (Dorr - Goal)
  • John gave a book to Mary. (Dorr - Goal)
  • John gave Mary a book. (Dorr - Goal)
  • Distant across the river the towers of the castle rose against the sky straddling the only land approach into Shrewsbury. (Gildea -Goal)
  • John ran to the store. (VerbNet - Location/Destination)
  • John gave a book to Mary. (VerbNet - Recipient)
  • John gave Mary a book. (VerbNet - Recipient)
LOC Location: Indicates static locations---as opposed to a source or goal, i.e., the (possibly abstract) location of the theme or event. This can also be a modifier rather than an argument. Refers to a location that isn't required by the verb but modifies the entire situation. This is a (possibly abstract) location of the theme or event.
  • He lived in France. (Dorr - Location)
  • The water fills the box. (Dorr - Location)
  • This cabin sleeps five people. (Dorr -Location)
  • John has five bucks. (Dorr - Theme)
  • She grabbed him by the arm. (Dorr - Locational Modifier)
  • She held the child in her arm. (Dorr - Locational Modifier)
  • She coughed on John. (Dorr - Locational Modifier)
  • The box on the shelf is red. (Dorr - Locational Modifier)
  • She sang on the stage. (Dorr - Locational Modifier)
  • The book unfolded before her. (Dorr - Locational Modifier)
  • These fleshy appendages are used to detect and taste food amongst the weed and debris on the bottom of a river. (Gildea - Location)
Possible corresponding VerbNet roles: Location, Source or Destination.
PTH Path: Indicates the path taken from a source to a goal without specifying either.
  • The dung-collector ambled slowly over, one eye on Sir John. (Gildea - Path)
  • The dung-collector ambled slowly over, one eye on Sir John. (VerbNet - Prep[+path] Location)
TMP Time Indicates time.
  • John sleeps for five hours. (Dorr - Time)
  • Mary ate during the meeting. (Dorr - Time)
  • Sam gave his speech during the conference. (Dorr - Time)
  • John arrived at nine o'clock. (Dorr - Time)
BEN Beneficiary: Indicates the thing that receives the benefit/result of the event/state.
  • John baked the cake for Mary. (Dorr - Benefactive modifier)
  • John baked Mary a cake. (Dorr - Benefactive modifier)
  • An accident happened to him. (Dorr - Benefactive modifier)
  • John baked the cake for Mary. (VerbNet - Beneficiary)
  • John baked Mary a cake. (VerbNet - Beneficiary)
PRP Purpose: Indicates the purpose/reason behind an event/state.
  • He studied for the exam. (Dorr - Purpose)
  • He searched for rabbits. (Dorr - Purpose)
MOD Modifier: Indicates a property of a thing such as color, taste, size, etc.
  • The red book ... (Dorr - Property)
  • The man who was eating ice cream was loud. (Dorr - MOD-PROP)
  • The book that unfolded endlessly before her was boring. (Dorr - MOD-PROP)
  • The book sitting on the table ... (Dorr - MOD-PROP)
MNR Manner: Indicates the manner in which an event took place.
  • She ran quickly. (Dorr - Manner)
  • The book unfolded Endlessly. (Dorr - Manner)
  • His brow arched delicately. (Gildea - Manner)
DGR Degree An Event/State/Thing modifier specifying degree, intensity or quantity, e.g. very, quite, rather, some, most, often.
  • I rather deplore the recent manifestation of Pop; it doesn't seem to me to have the intellectual force of the art of the Sixties. (Gildea -Degree)
NIL Null Indicates no thematic contribution. Typical examples are impersonal it and there, modals (can, will, shall), auxiliaries (has, be), infinitive marker (to), complementizer (that), punctuation, etc.
  • Yet while she had no intention of surrendering her home, it would be foolish to let the atmosphere between them become too acrimonious. (Gildea - Null)
NON None of the Above: This role is preserved for any thematic relation not described in this table.

The Suffix "+"

All of the roles (except for *) can be assigned to multiple nodes in the same subtree. In such cases, the parent node's role is suffixed with a "+" indicating that one or more of its children is marked with that role. This situation typically occurs with prepositional phrases and with conjunctions. For example, in I saw him in the park., the whole phrase in the park. indicates the location:
<tree>
  1                  I      PN      2     sbj:exp  [ word:I ]
  2                see      PN      9       *:*    [ word:saw ]
  3                 he      PX      2     obj:prc  [ word:him ]
  4                 in     Num      2     mod:loc+ [ word:in ]
  5                the       N      6     mod:mod  [ word:the ]
  6               park      AJ      4     obj:loc  [ word:park ]
  7                  .      PX      2     mod:nil  [ word:. ]
</tree>  

The Corpus to Annotate

The corpus used is the first 25 sentences from section 00 of the Penn Treebank. These sentences were converted into normalized syntactic dependencies automatically. The first five sentences are used as examples. The task of the participants in this experiment is to annotate sentences 6 through 25 with thematic roles, replacing "role" in the fifth column of the tree representation with the appropriate role. The following is an example of what the input/output of the task looks like.
BEFORE ANNOTATION:
<doc doc_id="PTB-00-1" sys_id="<YOUR-ID>">
<tree>
  1             pierre      PN      2     mod:role [ word:Pierre ]
  2             vinken      PN      9     sbj:role [ word:Vinken ]
  3                  ,      PX      2     mod:role [ word:, ]
  4                 61     Num      5     mod:role [ word:61 ]
  5               year       N      6     mod:role [ word:years ]
  6                old      AJ      2     mod:role [ word:old ]
  7                  ,      PX      2     mod:role [ word:, ]
  8               will      AX      9     mod:role [ word:will ]
  9               join       V      *       *:role [ word:join ]
 10                the       D     11     mod:role [ word:the ]
 11              board       N      9     obj:role [ word:board ]
 12                 as       P      9     mod:role [ word:as ]
 13                  a       D     15     mod:role [ word:a ]
 14       nonexecutive      AJ     15     mod:role [ word:nonexecutive ]
 15           director       N     12     obj:role [ word:director ]
 16               nov.      PN      9     mod:role [ word:Nov. ]
 17                 29     Num     16     mod:role [ word:29 ]
 18                  .      PX      9     mod:role [ word:. ]
</tree>
</doc>
AFTER ANNOTATION:
<doc doc_id="PTB-00-1" sys_id="<Nizar>">
<tree>
  1             pierre      PN      2     mod:mod  [ word:Pierre ]
  2             vinken      PN      9     sbj:thm  [ word:Vinken ]
  3                  ,      PX      2     mod:nil  [ word:, ]
  4                 61     Num      5     mod:dgr  [ word:61 ]
  5               year       N      6     mod:tmp  [ word:years ]
  6                old      AJ      2     mod:mod  [ word:old ]
  7                  ,      PX      2     mod:nil  [ word:, ]
  8               will      AX      9     mod:nil  [ word:will ]
  9               join       V      *       *:*    [ word:join ]
 10                the       D     11     mod:mod  [ word:the ]
 11              board       N      9     obj:gol  [ word:board ]
 12                 as       P      9     mod:prd+ [ word:as ]
 13                  a       D     15     mod:mod  [ word:a ]
 14       nonexecutive      AJ     15     mod:mod  [ word:nonexecutive ]
 15           director       N     12     obj:prd  [ word:director ]
 16               nov.      PN      9     mod:tmp+ [ word:Nov. ]
 17                 29     Num     16     mod:tmp  [ word:29 ]
 18                  .      PX      9     mod:nil  [ word:. ]
</tree>
</doc>
NOTE: In addition to specifying the role names in column 5, annotators may also choose to modify the dependency link numbers in column 4, although this is not required. (We will determine inter-annotator agreement in two independent runs, one that takes the roles into account and another that takes the dependency tree links into account.) For example, in the "AFTER ANNOTATION" tree above, the annotator may choose to make modify column 4 in line 3 so that the number "2" becomes the number "9":
   3                  ,      PX      9     mod:nil  [ word:, ]

Deadline

Deadline for submission of annotations is September 24, 2002. Please send annotated file to habash@umiacs.umd.edu
Copyright 2002 © University of Maryland College Park. All Rights Reserved.