Introduction This document contains the necessary information
for participating in the Interlingua Coding Experiment section of the
Interlingua Reliability Workshop. This document contains a
specification of an interlingual language, examples of annotated
sentences, and 20 parsed sentences to annotate. During the workshop
the participants will discuss issues of annotation differences,
measures of inter-annotator agreement, etc.
NOTE: We ask that participants conform to the specifications below,
rather than applying a different framework. If you would like to
apply your own framework, please create two submissions, one that is
compatible with the specifications below, and one that conforms to
your own framework.
The Interlingual Representation
The interlingual representation used here is simple dependency-tree
structures with paired syntactic-thematic-role links. This
representation includes both syntactic and thematic information such
as parts of speech and syntactic roles in addition to thematic roles.
This information can eventually be used to model the
syntactic-semantic interface. A detailed description of the structure
of this Interlingua is described later in this document. To get a
feeling for the representation, here is a simple example for the
sentence John broke the old vase:
EXAMPLE 1:
<doc doc_id="EX-1" sys_id="Nizar">
<tree>
1 john PN 2 sbj:agt [ word:John ]
2 break V * *:* [ word:broke ]
3 the D 5 mod:mod [ word:the ]
4 old AJ 5 mod:mod [ word:old ]
5 vase N 2 obj:thm [ word:vase ]
6 . PX 2 mod:nil [ word:. ]
</tree>
</doc>
The sentence, John broke the old vase, is specified as document
"EX-1" with "Nizar" as the ID of the "system" that annotated it. The
tree representing the sentence is enclosed between the tags
<tree> and </tree>. The left-most column indicates the
tree node number, which corresponds to the surface order of the word
represented by the node. The second and third columns indicate the
word lexeme and its part of speech respectively. The fourth column
indicates the parent node of the node specified in a give row. For
example, the parent, or dominating node, of John(1) and
vase(5) is break(2). The fifth column indicates the
syntactic-plus-thematic relationship of the node in a given row to its
parent. For example, John(1) is syntactically the subject
(sbj) and thematically the agent (agt) of the verb
break(2). Similarly, vase(5) is syntactically the object
(obj) and thematically the theme (thm) of the verb
break(2). If the sentence was The vase broke,
vase would still be the theme although it is syntactically a
subject. This representation pays less attention to detailing the
semantics of words other than arguments and modifiers of verbs. For
example, in the sentence above, old is marked as a syntactic
modifier and thematic modifier (versus a more detailed description
such as age). Punctuation in this sentence is considered a
non-contributor semantically, so it receives the thematic role
nil. The top node of the dependency tree (i.e. its root) is
indicated by *:*. The last column indicates
features of the different nodes. Here, the surface word is included
as a feature.
Representation Definition
Document Structure Every sentence is considered a document
with a specific doc_id. Different analyses of the sentences are
distinguished by their sys_id. A dependency tree is represented as a
list of node information contained between the tags <tree> and
</tree>. The following is a BNF description of the format of the
sentences:
<DOC> ::= <doc doc_id="<doc-id>" sys_id="<sys_id>">
<TREE>
</doc>
<TREE> ::=
<tree>
{<NODE-NUMBER> <lexeme> <POS> <PARENT> <SYN-ROLE>:<THETA-ROLE> <FEATURES>}*
</tree>
<NODE-NUMBER> ::= <num> || "*"
# roots are marked by "*"
<POS> ::= AJ || AV || AX || C || D || N || Num || P || PN || PX || V
# AJ Adjective
# AV Adverb
# AX Auxiliary
# C Conjunction
# D Article/Pronoun
# N Common Noun
# Num Numeral
# P Preposition
# PN Proper Noun
# PX Punctuation
# V Verb
<PARENT> ::= <num> || "*"
# roots are marked by "*"
<SYN-ROLE> ::= * || sbj || obj || iob || prd || mod
# * Root
# sbj Subject
# obj Object
# iob Indirect Object
# prd Predicate
# mod Modifier
<THETA-ROLE> ::= * || AGT || INS || EXP || THM || PRC ||
PRD || RSL || SRC || GOL || LOC || PTH ||
TMP || BEN || PRP || MOD || MNR || DGR ||
NIL || NON ||
AGT+ || INS+ || EXP+ || THM+ || PRC+ ||
PRD+ || RSL+ || SRC+ || GOL+ || LOC+ ||
PTH+ || TMP+ || BEN+ || PRP+ || MOD+ ||
MNR+ || DGR+ || NIL+ || NON+ ||
# Described in the next section -- Thematic Roles
<FEATURES> ::= [ {<key>:<value>}* ]
Comments can be included anywhere in a document. A comment should be
preceded by a percent sign (%) AND should be on a separate line.
Thematic Roles
The following table lists the thematic roles used in this representation.
Note: There is no distinction between arguments and modifiers or what is
obligatory or optional in a subcategorization frame.
| Role | Definition | Examples |
| * |
Root: Indicates the root of the tree. |
|
| AGT |
Agent: An agent can be defined using Dowty's criteria for
proto-agency as the highest proto-agent ever. Thus an Agent should
have the features of volition, sentience, causation and independent
existence.
|
- John broke the vase. (Dorr - Agent)
- Henry pushed the door open and went in. (Gildea - Agent)
- Henry pushed/broke the door. (VerbNet - Agent)
|
| INS |
Instrument: An instrument should have causation but no volition. Its sentience and existence are not relevant. Typically, an instrument appears with an agent (present or deleted) and can be paraphrased with "using". |
- The Hammer broke the vase. (Dorr - Agent)
- She hit him with a baseball bat.(Dorr - Instrument)
- He bought it for five dollars. (Dorr - Possessed Modifier)
- In the children with colonic contractions fasting motility did
not differentiate children with and without
constipation. (Gildea - Instrument)
- If this is the case can it be substantiated by
evidence from the history of developed societies (Gildea - Force)
- Jeez, that amazes me as well as riles me. (Gildea - Cause)
- The Hammer broke the vase. (VerbNet - Instrument)
- She hit him with a baseball bat.(VerbNet - Instrument)
- He bought it for five dollars. (VerbNet - Asset)
- His death saddened us. (VerbNet - Cause)
|
| EXP |
Experiencer: An experiencer has no causation but is
sentient and exists independently. Typically an experiencer is the
subject of verbs like feel, hear, see, sense, smell, taste, notice,
discern, detect, glimpse, listen, regard, seek, scrutinize, etc.
|
- John heard the vase shatter. (Dorr - Experiencer)
- It may even have been that John anticipating his
imminent doom ratified some such arrangement perhaps in the ceremony at the Jordan. (Gildea - Experiencer)
- John shivered. (VerbNet - Experiencer)
|
| THM |
Theme: The theme is typically causally affected or
experiences a movement and/or change in state (thus more
proto-patient-like in Dowty's terms although its existence is
independent). The theme can appear as the information in verbs like
acquire, learn, memorize, read, study, etc. and as the possessed
entity in verbs own, have, possess, fit, carry, sleep. It can also be
a thing, event or state (clausal complement).
|
- John went to school. (Dorr - Theme)
- John broke the vase. (Dorr - Theme)
- John memorized his lines. (Dorr - Information)
- She buttered the bread with margarine. (Dorr - Instrument)
- John wanted to go home. (Dorr - Prop)
- John has five bucks. (Dorr - Possessed)
- This box carries five eggs. (Dorr - Possessed)
- This cabin sleeps five people. (Dorr - Possessed)
- He loaded the cart with hay. (Dorr - Possessed Modifier)
- He robbed him of his rights. (Dorr - Possessed Modifier)
- He said, ``We would urge people to be aware and be alert
with fireworks because your fun might be someone else's tragedy.'' (Gildea - Topic)
- As soon as a character lays a hand on this item, the skeletal Cleric grips it more tightly. (Gildea - Patient)
- It says that rotation of partners does not demonstrate independence. (Gildea - Proposition)
- John went to school. (VerbNet - Theme)
- John broke the vase. (VerbNet - Patient)
- She buttered the bread with margarine. (VerbNet - Theme)
- They talked about Mary. (VerbNet - Topic)
- John and Mary argued for hours. (VerbNet - Actor)
|
| PRC |
Perceived: Refers to a perceived entity that isn't required
by the verb but further characterizes the situation. The perceived is
neither causally affected nor causative. It doesn't experience a
movement or change in state. Its volition and sentience are
irrelevant. Its existence is independent of an experiencer. PRC is
often paired with EXP. |
- He saw the play. (Dorr - Perceived)
- He looked into the room. (Dorr - Perceived)
- The cat's fur feels good to John. (Dorr - perceptual modifier)
- She imagined the movie to be loud. (Dorr - MOD-PROP)
- What is apparent is that this manual is aimed at the non-specialist
technician possibly an embalmer who has good knowledge of some medical
procedures. (Gildea - Percept)
- Rex spied out Sam Maggott hollering at all and sundry and making
good use of his over-sized red gingham handkerchief. (Gildea - State)
|
| PRD |
Predicate:
Indicates new modifying information about other thematic roles.
Typically, it is the predicate of verbs like be, become, consider (a
fool), pronounce (dead), presume (happy), etc.
|
- We considered him a fool. (Dorr - Pred)
- We pronounced him dead. (Dorr - Pred)
- She acted happy. (Dorr - Pred)
- The nation elected him president. (Dorr - Mod-pred)
- They worshiped him as their leader. (Dorr - Mod-pred)
- She imagined him as a prince. (Dorr - Mod-pred)
|
| RSL |
Result: Indicates the thing/event resulting from the verb's
occurrence. This thematic role's existence is fully dependent.
|
- John turned into a monkey. (Dorr - Goal)
- She wiped the floor clean. (Dorr - Pred)
- All the arrangements for stay-behind agents in north-west Europe collapsed,
but Dansey was able to charm most of the governments in exile in London
into recruiting spies. (Gildea - Result)
- John turned into a monkey. (VerbNet - Product (turn class))
- She wiped the floor clean. (VerbNet - Oblique[+state])
|
| SRC |
Source: Indicates where/when the theme started in its motion, or what
its original state was, or where its original (possibly abstract) location/time was.
|
- John left the house. (Dorr - Source)
- John ran away from home. (Dorr - Source)
- John slept from 5pm until 10pm. (Dorr - temporal Source)
- He heard the sound of liquid slurping in a metal container as Farrell
approached him from behind. (Gildea - Source)
- John left the house. (VerbNet - Location/Source)
|
| GOL |
Goal: Indicates where the theme ends up in its motion, or what
its final state is, or where/when its final (possibly abstract) location/time is. |
- John ran home. (Dorr - Goal)
- John ran to the store. (Dorr - Goal)
- John gave a book to Mary. (Dorr - Goal)
- John gave Mary a book. (Dorr - Goal)
- Distant across the river the towers of the castle rose
against the sky straddling the only land approach into Shrewsbury. (Gildea -Goal)
- John ran to the store. (VerbNet - Location/Destination)
- John gave a book to Mary. (VerbNet - Recipient)
- John gave Mary a book. (VerbNet - Recipient)
|
| LOC |
Location:
Indicates static locations---as opposed to a source or goal, i.e., the (possibly
abstract) location of the theme or event. This can also be a modifier
rather than an argument. Refers to a location that isn't required by
the verb but modifies the entire situation. This is a (possibly
abstract) location of the theme or event. |
- He lived in France. (Dorr - Location)
- The water fills the box. (Dorr - Location)
- This cabin sleeps five people. (Dorr -Location)
- John has five bucks. (Dorr - Theme)
- She grabbed him by the arm. (Dorr - Locational Modifier)
- She held the child in her arm. (Dorr - Locational Modifier)
- She coughed on John. (Dorr - Locational Modifier)
- The box on the shelf is red. (Dorr - Locational Modifier)
- She sang on the stage. (Dorr - Locational Modifier)
- The book unfolded before her. (Dorr - Locational Modifier)
- These fleshy appendages are used to detect and taste food
amongst the weed and debris on the bottom of a river. (Gildea
- Location)
Possible corresponding VerbNet roles: Location, Source or Destination.
|
| PTH |
Path: Indicates the path taken from a source to a goal without specifying either. |
- The dung-collector ambled slowly over, one eye on Sir John. (Gildea - Path)
- The dung-collector ambled slowly over, one eye on Sir John. (VerbNet - Prep[+path] Location)
|
| TMP |
Time Indicates time. |
- John sleeps for five hours. (Dorr - Time)
- Mary ate during the meeting. (Dorr - Time)
- Sam gave his speech during the conference. (Dorr - Time)
- John arrived at nine o'clock. (Dorr - Time)
|
| BEN |
Beneficiary: Indicates the thing that receives the benefit/result of the event/state. |
- John baked the cake for Mary. (Dorr - Benefactive modifier)
- John baked Mary a cake. (Dorr - Benefactive modifier)
- An accident happened to him. (Dorr - Benefactive modifier)
- John baked the cake for Mary. (VerbNet - Beneficiary)
- John baked Mary a cake. (VerbNet - Beneficiary)
|
| PRP |
Purpose: Indicates the purpose/reason behind an event/state. |
- He studied for the exam. (Dorr - Purpose)
- He searched for rabbits. (Dorr - Purpose)
|
| MOD |
Modifier: Indicates a property of a thing such as color, taste, size, etc.
|
- The red book ... (Dorr - Property)
- The man who was eating ice cream was loud. (Dorr - MOD-PROP)
- The book that unfolded endlessly before her was boring. (Dorr - MOD-PROP)
- The book sitting on the table ... (Dorr - MOD-PROP)
|
| MNR |
Manner: Indicates the manner in which an event took place.
|
- She ran quickly. (Dorr - Manner)
- The book unfolded Endlessly. (Dorr - Manner)
- His brow arched delicately. (Gildea - Manner)
|
| DGR |
Degree An Event/State/Thing modifier specifying degree, intensity or quantity, e.g. very, quite, rather, some, most, often. |
- I rather deplore the recent manifestation of Pop; it
doesn't seem to me to have the intellectual force of the art of the
Sixties. (Gildea -Degree)
|
| NIL |
Null Indicates no thematic contribution. Typical examples are impersonal it and there, modals (can, will, shall),
auxiliaries (has, be), infinitive marker (to), complementizer (that), punctuation, etc.
|
- Yet while she had no intention of surrendering her home,
it would be foolish to let the atmosphere between them
become too acrimonious. (Gildea - Null)
|
| NON |
None of the Above: This role is preserved for any thematic relation not described in this table. |
|
The Suffix "+"
All of the roles (except for *) can be assigned to multiple
nodes in the same subtree. In such cases, the parent node's role is
suffixed with a "+" indicating that one or more of its children is
marked with that role. This situation typically occurs with
prepositional phrases and with conjunctions. For example, in I saw him
in the park., the whole phrase in the park. indicates the location:
<tree>
1 I PN 2 sbj:exp [ word:I ]
2 see PN 9 *:* [ word:saw ]
3 he PX 2 obj:prc [ word:him ]
4 in Num 2 mod:loc+ [ word:in ]
5 the N 6 mod:mod [ word:the ]
6 park AJ 4 obj:loc [ word:park ]
7 . PX 2 mod:nil [ word:. ]
</tree>
The Corpus to Annotate
The corpus used is the first 25 sentences from section 00 of the Penn
Treebank. These sentences were converted into normalized syntactic
dependencies automatically. The first five sentences are used as
examples. The task of the participants in this experiment is to
annotate sentences 6 through 25 with thematic roles, replacing "role"
in the fifth column of the tree representation with the appropriate
role. The following is an example of what the input/output of the
task looks like.
BEFORE ANNOTATION:
<doc doc_id="PTB-00-1" sys_id="<YOUR-ID>">
<tree>
1 pierre PN 2 mod:role [ word:Pierre ]
2 vinken PN 9 sbj:role [ word:Vinken ]
3 , PX 2 mod:role [ word:, ]
4 61 Num 5 mod:role [ word:61 ]
5 year N 6 mod:role [ word:years ]
6 old AJ 2 mod:role [ word:old ]
7 , PX 2 mod:role [ word:, ]
8 will AX 9 mod:role [ word:will ]
9 join V * *:role [ word:join ]
10 the D 11 mod:role [ word:the ]
11 board N 9 obj:role [ word:board ]
12 as P 9 mod:role [ word:as ]
13 a D 15 mod:role [ word:a ]
14 nonexecutive AJ 15 mod:role [ word:nonexecutive ]
15 director N 12 obj:role [ word:director ]
16 nov. PN 9 mod:role [ word:Nov. ]
17 29 Num 16 mod:role [ word:29 ]
18 . PX 9 mod:role [ word:. ]
</tree>
</doc>
AFTER ANNOTATION:
<doc doc_id="PTB-00-1" sys_id="<Nizar>">
<tree>
1 pierre PN 2 mod:mod [ word:Pierre ]
2 vinken PN 9 sbj:thm [ word:Vinken ]
3 , PX 2 mod:nil [ word:, ]
4 61 Num 5 mod:dgr [ word:61 ]
5 year N 6 mod:tmp [ word:years ]
6 old AJ 2 mod:mod [ word:old ]
7 , PX 2 mod:nil [ word:, ]
8 will AX 9 mod:nil [ word:will ]
9 join V * *:* [ word:join ]
10 the D 11 mod:mod [ word:the ]
11 board N 9 obj:gol [ word:board ]
12 as P 9 mod:prd+ [ word:as ]
13 a D 15 mod:mod [ word:a ]
14 nonexecutive AJ 15 mod:mod [ word:nonexecutive ]
15 director N 12 obj:prd [ word:director ]
16 nov. PN 9 mod:tmp+ [ word:Nov. ]
17 29 Num 16 mod:tmp [ word:29 ]
18 . PX 9 mod:nil [ word:. ]
</tree>
</doc>
NOTE: In addition to specifying the role names in column 5, annotators may
also choose to modify the dependency link numbers in column 4,
although this is not required. (We will determine inter-annotator
agreement in two independent runs, one that takes the roles into
account and another that takes the dependency tree links into
account.) For example, in the "AFTER ANNOTATION" tree above, the
annotator may choose to make modify column 4 in line 3 so that the
number "2" becomes the number "9":
3 , PX 9 mod:nil [ word:, ]
Deadline
Deadline for submission of annotations is September 24, 2002.
Please send annotated file to habash@umiacs.umd.edu
Copyright 2002 © University of Maryland College Park. All Rights Reserved.
|