Edit | History | Changes Home page | Site map | Search | Recent changes | Help

Korean IL0->IL1 Annotation Manual

At the time of this writing, Tiamat_CL3 should have been installed and tested. This version of the annotation procedure assumes this version of the annotool.

I. Set Language Option

First, you should set the annotation tool work for Korean. Go to "Options" and choose "Set Language", and enter the Korean font name, for example, "Baekmuk Batang".

II. LOADING A NEW FILE

Most of the files you will be working with will show up on the annotators wiki. You should know how to copy a file from the wiki and save it in your workspace. All files should have a .fs extension.

In the annotool window, in the upper left hand corner, click on "New FS File". This will allow you to select a file to load by navigating through your directory space. After selecting a file and clicking on "Open File," the text of the article should show up in the large box to the left, with the first sentence highlighted in yellow. Each sentence of the text will also have an identifying code at the beginning. You do not need to worry about this code.

The first sentence should also appear in the smaller box below it, with a number of the lexical items (words) on a gray background. These are the words you will annotate.

The procedure is identical for loading in a file you have already partially completed.

III. SAVING A FILE

When you have done some work and want to save your work, click on the "Save As" button, under the "File" menu in the upper left-hand corner. Again, a separate window will pop up, much as when you clicked on "New FS File", allowing to you to decide what directory to save your file in. Give it a different name, or it will write over the initial file you loaded. Be sure that your new filename has an .fs extension.

Once you have saved a file, you can then click on "Save" in the File menu to save a newer version.

IV. MOVING FROM SENTENCE TO SENTENCE IN A FILE

Use the "Previous Sentence" and "Next Sentence" to navigate in the file to be annotated. The appropriate sentence should be highlighted in yellow.

V. VIEWING THE GRAPH OF A SENTENCE

Click on "Show Graph" at the bottom of the left-hand side of the tool. This will pop up a window that displays the "semantic dependency graph" of the sentence.

Each sentence of the file is prefixed by an identifying code. This code will show up as the very top node of the dependency structure, along with the complete sentence. For the purposes of annotation, you may ignore this node.

The dependency structure basically displays how words depend on each other in the sentence. For example, in the sentence, "John kissed Mary on the cheek." "kiss" is at the top of the tree (underneath the node with identifying information), since that specifies the event that the sentence is about. "John" and "Mary" both depend on "kiss" since they tell who was involved in the event. "On the cheek" also is dependent on "kiss" since it tells what part of Mary's body John kissed. "Cheek" is the object of the preposition "on" and so depends on it semantically. This information is partially semantic and partially syntactic. If the sentence had been, for instance, "John kissed Mary's cheek." then "John" and "cheek" would depend on "kiss" while "Mary" would be a specifier and thus a dependent of "cheek.

These is an entire manual devoted to specifying exactly how this graph should look for any particular sentence. You should be familiar with that manual.

Each word node also contains addition information ("features"). Immediately below the word, there is an indication of the Part of Speech (such as Noun, Verb, Adjective, Adverb, etc.) Below that, there will be a space for the Grammatical Role (like Subject, Object, Indirect Object).There will be morphological information (like number, tense, person, and mood). There will be a space for comments. And there will be space for the annotation information that you will add to the structure.

Here are the parts of speech that can show up in a dependency structure and their abbreviation in the graph:

Symbol Part of Speech N Noun V Verb Adj Adjective Adv Adverb Pron Pronoun Num Number Pun Punctuation Sym Symbol Uh Hesitation Misc Miscellaneous

Here are the grammatical roles that can show up and their abbreviations:

Symbol Grammatical Role Subj Subject Obj (Direct) Object Obj2 (Indirect) Object Mod Modifier Root Root

VI. CHANGING THE DEPENDENCY STRUCTURE GRAPH

THE RULE IS "DO NOT CHANGE ANY THING IN THE INPUT" -- DON'T MODIFY THE DEPENDENCY STRUCTURE, DON'T CHANGE THE PART OF SPEECH, DON'T CHANGE ANY FEATURES.

At some future time, you may be responsible for making sure that the tree structure is correct (from your point of view). That includes verifying the dependency structure and the additional feature information. However, for this initial exercise we are taking the dependency tree structure as a given. So, for this exercise, you don't modify the tree or its features.

Changing the tree structure is done within the tree editor (tred), that you can invoke independently or by clicking on "Show Graph" in the annotool. This needs its own instruction manual, but as you will not need to modify the structure for this exercise, you won't learn about it here.

(The tree structure is produced first automatically with a parsing program, and then examined and corrected by an expert. However, there is no guarantee that the expert has noticed everything and corrected everything properly. And, it is possible that more than one parse is reasonable.If you disagree with the input you receive, make a note of it and send it to Owen Rambow--rambow@cs.columbia.edu)

VII. ANNOTATING THE TEXT

Annotation involves two parts. One is noting the semantic concept from the Omega ontology that most closely represents the meaning of each lexical item in context. For this exercise, you will annotate only nouns, verbs, adjectives, and adverbs. Don't annotate any other word labelled as any other part of speech. In particular, you don't need to annotate Proper Nouns. Subsequent sections of this manual tell you how this will be done.

The second part involves annotating dependencies of verbs to indicate their semantic role in the action or event. The section of the manual on verbs below will tell you how to do this.

You should be able to annotate the entire text from within the annotool itself. However, you CAN do the annotation by hand from within the tree editor (tred). In order to make changes of any sort to the graph or features in tred, you must first change the menu slot in the upper right hand corner of the tred window from "Analytic" to "Analytic_Correction". If you do not do this, you will not be permitted to make any changes to the displayed graph.

If you do make changes in tred, the changes you make in the tree editor won't be saved unless you click the "Apply" button in the annotool. This updates the working file in the tool to reflect the changes you made in the tree editor.

HOWEVER, it also replaces any changes you made to the file using the annotool since the time you invoked the tree editor the last time. So you may lose some work, if you click the "Apply" button inappropriately. That's why it's also a good idea to close the tree editor every time you stop using it. It's easy to invoke and doesn't take a lot of time, and you avoid the possibility of doing something you don't want.

VIII. ASSISTANCE IN ANNOTATING

In addition to the annotool and the tree editor (tred), you may find it useful to have some additional information available to help with your task. You can set these up from the IAMTC Annotators web page, perhaps as separate tabs within one browser window.

The information that it might be useful to have on hand could include: (1) the list of the theta (semantic) roles, their meaning, and some example sentences; (2) the Omega browser, which gives more information about the ontology than the annotool window; (3) the manual which describes what the semantic dependency structure should look like; and (4) possibly this manual, though soon enough you will have it all memorized.

IX. ANNOTATING ONTOLOGICAL CONCEPTS

The first task you will perform is to annotate every noun, verb, adjective, and adverb with a set of ontological concepts. These are, in essence, "meanings" arranged in a tree-like hierarchy, moving up from more specific to more general. For example, there may be a concept of "Yorkshire terrier." This concept is a subset of the concept "dog", which is a subset of the concept of "mammal", etc. Your task is to find the most specific concept in the ontology that fits the meaning of the lexical item you are annotating.

You might think of concepts associated with nouns as objects or classes of objects, concepts connected to verbs as events or event-types, concepts connected to adjectives as properties, and concepts connected to adverbs as manners, ways of doing things. This is, of course, overly simplistic.

The lexical item itself is the "key" into the ontology -- each lexical item is connected to a number of concepts in the Omega ontology. This is partially because lexical items out of context can have several different meanings: "bank" can refer to a financial institution, the edge of a river, the act of putting money into a bank, or to the act of a pilot turning an airplane to the left or right. When you look at a word in context, you should be able to figure out which meaning is in play.

The Omega ontology that we are using is itself a combination of other ontologies. One of these is WordNet, a lexical ontology built at Princeton. The other is Ontosem/Mikrokosmos, a translation-based ontology built at CRL and UMBC. At the highest levels of the Omega ontology, concepts were merged, but at lower levels this was not always possible. As a result, there are often concepts derived from these two predecessor ontologies that are either identical or very close in meaning. In addition, wordnet itself may have concepts that are very closely related in meaning. In Omega, wordnet concepts are in small letters, Mikrokosmos concepts in capital letters.

Because of the origin of Omega, you may not be able to select _one_ concept that is the best or only concept for a lexical item. So you are permitted to select as many as you think are appropriate.

Here is how you will do this.

When you loaded a text, one sentence was highlighted and also appeared in the small box below the big one on the left of the annotool. Take a moment to think about the real meaning of the sentence. Make sure you understand it. Try to think of other ways to communicate the same meaning: that is, paraphrase the sentence. Do this for each sentence as you proceed through the text.

A number of the words in that box will have a gray background. Pick one to annotate. Click on it, and it will be highlighted in yellow. In addition, the English translation of the word will appear in the box on the upper right side of the tool, labelled "Node you selected." If there are more than one translations for the word in IL0 file, they are all in the list box and you can choose one proper one from the list. Or if you don't like anything from those, you can enter a good English word for the Korean word.

In the box below, "Stemming" should be "disabled" since the word you are looking at will already be in its citation form. That is, if the word in the text is a plural noun, the singular form will show up in the "Node you selected". Similarly, if a verb is thrid person present, or past tense, it will show up in the infinitive form in that box. Capitalization is important, but it should also be done automatically. However, you might want to check just to make sure the word in the box is appropriately capitalized or not.

The next step is to click on the box labelled "Direct query." This sends the word to Omega and returns all the concepts associated with that word. They appear in a list form in the pull down menu and in full form in the large box in the lower right of the tool.

It is possible that there may be no concepts associated with the lexical item. A common situation is multi-word lexical items like noun-noun compounds. The tool groups all nouns together to make one node. For example, "child safety seat" will be just one node. So would "airplane seat belt". Neither of these concepts are in Omega, and only "Dummy Concepts" show up in the window. In such cases, generalize to the next higher category. In these examples, a child safety seat is a kind of safety seat, while an airplane seat belt is a kind of seat belt. Querying Omega shows that "seat belt" is a concept, while "safety seat" is not. So in the second case, we can generalize again and query Omega for "seat".

It is also possible to check variations in spelling, word division, and hyphenation. You can also do this with other lexical items. For instance, "recapitalize" is not an Omega concept, but "capitalize" is.As a last resort, you might try to query synonyms of the lexical item to see if they are in Omega.

If you are unable to find a reasonable set of concepts to choose from then select both Dummy concepts (one for Wordnet and one for Mikrokosmos). By "reasonable set of concepts" we mean concepts that would reasonably answer somebody's question, "What kind of event was that?" or "What kind of object is that?" So, if you are looking at concepts like "Object" or "event" or "non-agentive motion event" you are looking at too high a level in the ontology (unless, of course, the lexical item itself is that vague -- a word like "event" or "object").

Once you have a reasonable set of concepts to work with, your task is to select those concepts that fit the meaning of the lexical item in context. When you find a concept that fits, click on it (it will be highlighted in yellow) and then click the "Add/Remove" button at the bottom of the tool. It will then show with a gray background to indicate that it has been selected. Deselecting works the same way: click on the concept name (it will be highlighted), then click on "Add/Remove" and the concept will no longer have a gray background.

In the larger window, each concept has additional information. These are (1) the part of speech appropriate for the lexical items connected to that concept; (2) an English definition of the meaning of that concept; (3) other lexical items connected to that concept (essentially, synonyms); (4) example sentences; and (5) the parent node above the concept (the next most general category to which the concept belongs) and its definition.

A concept fits and should be selected if the meaning of the lexical item in context is compatible with all five bits of information. That is: (1) it is the right part of speech; (2) the definition is appropriate for the annotated word in context; (3) you could substitute the synonyms for the word in the sentence and still have approximately the same meaning; (4) you could fit the word into the example sentences and the word would have the same meaning as in the sentence in the text; and (5) the parent node is a reasonable generalization of the meaning of the word you are annotating.

Do not pay attention to the name of the concept itself. This often has special material in it (following a > or a <) to identify that specific concept. That material is NOT part of the meaning of the concept and could mislead you.

If a concept does not fit, is not compatible with even one of the five bits of information, it should not be selected.

Once you have selected all the concepts you think are appropriate, (and, for verbs, edited the role assignment as necessary) click the "Accept" button.

X. ANNOTATING THEMATIC ROLES

When you click on a word to annotate, you will notice that other words in the sentence get underlined in red. These are lexical items that depend on the word you are annotating. These are words that hang down from the word in the graph.

If the word you are annotating is a verb, you will be asked to mark a thematic role for each of the dependents (though the thematic role may be "none"). Thematic roles (also called semantic roles or theta roles, and when they are presented in table format, the table is often called a "theta grid") are categories of participants in events. You will assign each dependent of a verb an appropriate thematic role.

Some concepts will have a theta grid associated with it. In the larger box on the right you will see which dependents of the verb are assigned which thematic role. If you are happy with that selection, you don't need to do anything.

Frequently, however, verbs will not have an associated theta grid, or it will be incorrectly assigned, or it will be missing assignments (for instance, it will never mark adverbs or prepositional phrases that indicate TIME or LOCATION).

In these cases, you must click on "Edit Role Assignment." A pop-up window will allow you to assign any thematic role to any dependent. Once you are satisfied with your assignments, click OK to close the assignment window. Once you are satisfied with the assignments for every concept you have selected, you may go ahead and hit the "Accept" button.

You are now ready to proceed to another word.

XI. FINISHING UP

When you are leaving a session, be sure to save your work, and then Exit the tool, using the Exit selection on the pull down menu under "File" in the upper left hand corner of the tool.

When you have completed your work on a file, you may upload it to the annotators wiki. When you do this, be sure to include the special wiki preformating command "<<<" on a line before the text and ">>>" afterwards.


Version 4, Mon 16 Aug 2004 21:29:57 - created Mon 16 Aug 2004 19:28:42
Edit | History | Changes Home page | Site map | Search | Recent changes | Help