|
First two sample texts to be annotated
|
IAMTC
|
|
Text for Annotation Dry Run to Refine Annotation Manual
|
- For Dec 12: take the hand-corrected FS files (see below), add IL1 info. Use 3rd set of instructions.
- For Dec 19: have annotators take the complete hand-corrected FS files (see below), add IL1 info. Use 3rd set of instructions.
The following two texts have been selected from the French corpus to enable researchers (and annotators, if desired) to begin to apply the annotation standard. If there are problems or questions, comments can be made directly in the annotation manual section of the IAMTC wiki, per the instructions of Owen, the Annotation Manual Czar.
F1E1 F1E4
Note that these are two English translations of the same French original text, so that in addition to debugging the annotation standard, we can begin to discuss differences in the English translations.
Parser Output and FS Files |
- Connexor Parser (Nizar)
- LDA Parser (Owen)
- Here are the corrected hand-corrected fs fs files for F1E1. -- actually, only the first 3 sentences are checked.
- Here is another version of the same files: fs files for F1E1. The difference to the previous version is that this version has features for filling in omega concepts and theta roles, but it may not be campatible with Tiamat.
- Here is the complete version of the same files in the Tiamat-compatible format: fs files for F1E1.
- F1E4.fs: fs file for F1E4.
- Text Files for Use with Tiamat_
- Here is a .txt file that works for us with Owen's .fs files: owen.txt. Don't forget it must have the same name as your .fs file.
- Text file for F1E4.fs (untested): txt file for F1E4
NOTE: new instructions for use with these FS files below, after the old instructions!
From Owen:
This is an excellent idea. I suggest we take the following sentences from
F1E1 (the first two):
#Sentence TEXT 010 As CSA faces serious difficulties, the Czechs offer to
buy back the French shares in their national airline
#Sentence The Czech Minister of Transportation, Jan Strasky, proposed on
Saturday, January 8 that the State buy back the shares (just less than 20%)
held by Air France and the Reserve Bank/deposit accounts in the national
airline, CSA, so that they could be sold "to another partner, probably a
bank."
We do the following (pencil-and-paper):
- Start with the Connexor parse tree.
- Correct it as needed. Be sure to annotate with deep-syntactic relations.
- Choose Omega concepts for verbs, nouns, adjectives, adverbs using the web interface.
- For verbs, given the choice of concept, use the LCS list Nizar sent out a pointer to in order to annotate the arguments/adjuncts with role labels. ( LCS-based WordNet-Verb-ThetaRoles list )
- Put the results up on the Wiki in the same format as Nizar's output.
From Ed, after Friday's meeting:
- Start with the Connexor parse tree.
- Correct it as needed. Be sure to annotate with deep-syntactic relations.
- For each verb and noun, understand what it really means. Find a suitable high-level concept in Omega, preferring to use more general ones (in caps).
- Then, still in Omega, look at the options provided by WordNet (lowercase), and find something suitable. For each option, it might help to click on the word sense (left window) and look at the hierarchy structure (right window), to see whether the concept you have chosen in the preceding step appears in the hierarchy structure immediately above your word sense. If so, you're happy. If not, think againt about the word sense and/or the general concept you have chosen.
- For verbs, use the LCS list Nizar sent out a pointer to in order to annotate the arguments/adjuncts with role labels. ( LCS-based WordNet-Verb-ThetaRoles list )
- Fill the concept equivalents to the nouns and verbs that you have found, in the same format as Nizar's output.
- Put the results up on the Wiki.
Instructions for Annotating IL1 from Hand-Corrected IL0 (Owen, Dec 11 03)
- Start with the FS file.
- Alter the syntax only if you find important errors (for example, if you disagree with the analysis of "39%", make a note, but do not change the analysis).
- Using Tiamat, annotate meaning for nouns and verbs, and arc labels, saving the altered fs file.
- Document anything problematic in the comment field of the appropriate node.
- Put the results (new fs file) up on the Wiki.
Instructions for Annotating IL1 from Hand-Corrected IL0 (Ed, Jan 12, 04, building on Owen's)
- Download the fs file from the WIKI page
- Alter the syntax only if you find important errors (for example, if you disagree with the analysis of "39%", make a note, but do not change the analysis).
- Using Tiamat, annotate meaning for nouns, verbs, adjectives, and adverbs (all these words are highlighted by the tool) and theta roles for certain arc labels. In particular:
- Read the sentence carefully, and try to grasp its meaning. It helps if you re-state the sentence to yourself in new words, to get the essential meaning of all the important words. For example, "he took a bath" really means "he bathed" -- the "took" means nothing.
- Find the next N, V, Adj, or Adv. Do not do Pronouns ("he", "they", etc.).
- Considering its meaning (from your understanding/restatements), find its appropriate WordNet concept in Omega. Choose this in Tiamat.
- Also try to find its appropriate Mikrokosmos concept (these are usually capitalized, with a $ inside). Also record this in Tiamat.
- For Verbs only: also find the theta roles that fit best. Tiamat will display some; if it doesn't, or if you don't like the ones it displays, then edit to make your own, or to correct the. Also record this in Tiamat.
- For multi-word names (like "John Smith" and "Air France" and "San Francisco" and "January 6"), decide which type of entity this is (it will be a Date, Person, Location, Company, etc.), and annotate just the head word (that is, the one labeled Subj or Obj, not any labeled Adj).
- Document anything problematic in the comment field of the appropriate node.
- Make sure you save the altered fs file.
- Put the results (new fs file) up on the Wiki in your own space.
Combined Annotations
The inter-annotator agreement is measured for annotations submitted by: David, Jad, Jeff, Namhee, Nizar, Soomin, Steve and Tim.
All Annotations
All Annotations + No Annotation
Version 55, Fri 09 Jan 2004 15:38:32 [EH] - created Fri 14 Nov 2003 14:40:03 [KJM]