| Sunday Notes | IAMTC |
NP Exercise
what does np exercies show?
today, this year -- 1 or two concepts?
Jap: one word for "today", for "this week", for "this year"?
2 ways: consistent: units + deictic unit, or always one
inconsistent: concept for today, then compositionontology: has general mechanism, and some "shortcuts" such as "today"
bigram info (ISI): what MI cutoff?
eh: middle ground: metaphorical meaning of one N -> non-compositional
Japanese: some cases of 1 word J, 2 words E (pattern A plus bad harvest)
Arabic:
Problem for Arabic: only synt issues, but prepositions must go in IL2
"e-commerce" type examples, become 2 words
Crown Prince <-> wli (local saint, successor) 3ld (Age) non-compositional in A --> crucial case
issue: compositional in language A as foreign term inlanguage B ("joint venture" in Spanish)
issue: mistranslations
Hindi:
nothing interesting, but maybe effect of genre; issue of mistranslations
Korean:
service quality example: mistranslation, or inference, or synonyms?
Spanish:
real estate -- inmobilaria
French
bonne partie de l'annee 1993
conjecture
Teruko summary
need way of inputtingsingle concepts force rtain languages where E has only two concepts
Omega: Lessons Learned from NP Exercise
Omega becomes merge of all languages, IL1 of all languages
Ed: should me move in Omega go IL2?
prototypical example: "this year"
2 questions:
1) IL 1 annotation -- what to do when concept is missing? 2) IL 2 annotation -- how is it annotated?
ad 1 -- we must Allow annotators to add concepts in any case; allow them to relate concepts to other concepts; links introduced by annotator : "IS-A" and "unspecified"
(note: special issue about "this" vs open-class items)
ad 2 -- topic for Monday
Ontology: General Issues
EH:
Background
1) Ontobank -- moving along well, identifying senses that can be tagged with high IAA, start with PropBank senses; nouns (at NYU): same thing; aim: thousands of words by 2005
Ontobank: driven by lexica IAMTC: drive nby texts
other differnce: of course, multilingual
Idea: as Omega changes, automatically update IAMTC annotations
Cobuild: based on corpus studies
We can use Ontobank experience, but we bring in multilingual experience
Omega aims for 60,000 concepts
ISI: Nick White, doing Omega updating, based on PennPropbank data, stat data, IAMTC feedback
Publications
Workshops at ACL
- Graeme and Sergei on text meaning (Eduard presents on panel) - informal meetings with other projects - multiword expression: no one going - SENSEVAL: no one going!
we need to differentiate ourselves and make somethingunique how? ocnentrate on IL2 issues
multilingual gives us perspective on:
- what is in ontology? - multiword expressions - syntactic divergences/paraphrases - microtheories for different phenomena: scalars, time, epistemic
statuses (modality, aspect,...), etc.
Omega: How to Use with Foreign Language
2 issues
Multilingual WordNets
Should we use them?
Experience: David F used Spanish WN for some workshop
EuroWordnet: 8 langugaes (inc Eng) and interlingual index
-- Digression: Microtheory: number
options: - we can create separate WNs and have interl index - cretae one thing, but keep things different - create one thing, semanticize
Foreign Language IL1 Annotation Process and Omega Extension proposal:
Do both steps:
A) use multilingual WN, lexicon, or other sense inventory: if present, choose if not present, do nothing
B) check if in Omega, if present, choose if not, create bug spec for Omega czar
Implementation question: should list of candidate senses be represented in two steps or in one?
Other issue: should choice of foreign concept entail limitation of Omega concepts if links are present?
Point: using a bilingual dictionary can serve as sense inventory for step A; is this the preferred way? an empirical question, need to determine on the basis of available resources for each language
ISSUE OF CONTENTION: how and whether annotators insert into Omega
proposal: annotator creates spec of new node in Omega, Omega czar entres the new node (later)
crucial question: are foreign WNs indexed into E WN?
Possible objections:
- semanticist: this is just info about a bunch of languages
- linguists: this is kind of interesting, but things are mushed together in unpredicatible ways
- computational people: how do we use this?
Need to address these objections; main response: we have annotated text!
Question: How do annotators know that something is not in Omega? How far do you have to search?
Answer: annotators don't search, tool gives them a list of options, and annotators choose
IL1 and IL2 use the same (evolving) ontology
J: foot-leg no thumb: "thumb" and "foot-leg" E: big toe: "big" and "toe"
Ambiguity vaguess test: John lives near a bank[fin-inst] and Mary does too [river]
multiword expressions: "hot dog"?
At IL0? IL1? ***********UNRESOLVED************
"hot dog" as one node in IL0
pro:
- segmentation issue: cannot insert words between parts
- morphological restructions on dependents
- make it easier on annotators
- why not?
contra:
- practical reasons: interface with as many parsers as possible
- we lose info that may be useful: different MWE lists in two languages with identical string sequences
- morphology in "kick the bucket" -- other languages can have same effect in Adj-N combos
- just a notational difference
- morphological restructions on dependents -- independent on whether it is a MWE or not, irrelevant argument
Arc Labels
Lori presentation
3 groups of cases:
- traditional valency alternations for one verb - FrameNet-type cases - "random" variation (scripts etc)
4 groups of approaches:
- PropBank approach - lcs/localist approach - FrameNet-type - "random" variation (scripts etc)
But: need to distinguish annotation procedure from scientific content
Annotation approaches:
- use lexicon - do not use lexicon
need to decide:
* will corpus be used to study sem role -> synt role mappings? * model interaction with tense and aspect for inferences?
Possible Solution
OMEGA CZAR ACITIVITY
1) Import a chosen set of FrameNet frames into Omega 2) Omega-czar chooses names for all verb meanings that fall under frames 3) Omega-czar chooses LCS-style labels for all other motion metaphor concepts 4) For all remaining concepts, use "dresser""dressee" style labels
ANNOTATOR ACTIVITY
1) Choose concept 2) If a concept does not have a set of argument labels, annotator does nothing and notify Omega czar 3) If there is a set of argument labels, map arguments & adjuncts to labels
Q: does concept BUY still exist once we have concept COMMERCIAL-TRANSACTION? At IL2, should the annotator choose a new concept and new arc labels?
We should look at some more examples.