Checklist for Producing IL0 from
Connexor Parses
- Delete Words
- Determiners: words such as a,
an and the; usually with POS=Det
- Keep quantifiers such as all. Also keep (for
now) demonstratives such as this.
- Punctuation. EXCEPT for semantically meaningful cases:
- Quotes after a communication verbs such as say, tell,
proclaim, etc.
- Commas acting as conjunctions: Mary, John, Paul, and Burt.
Note here that the third comma before and is deleted.
Also, the POS for the remaining commas should be Conj.
- Auxiliaries: words such as will, would, be, have,
do.
For example, John could have been playing golf becomes
only John could playing golf. If the main verb is do,
have
or be do not delete (yet).
- Keep modal auxiliaries such as could
and might.
- Words with zero meaning such as
- that
in I requested that Mary stays.
But not when it is meaningful as a relativizer: the dog that
Mary bought is cute. Here, that refers to dog
(well, one can argue differently, but let's just say it does, ok?). - to
in I wanted to play golf. But not
when to is a preposition as in I went to school.
- Expletive
pronouns: it, there
- It is
hoped that John will live, the it
is non-referential and stands in for the surface subject, which is the live
clause. (The deep label of the clause is
object, since the verb is passive.)
- Combine Words
Combine the nodes into one node. Word
and Lexeme features should be adjusted to contain all
words separated by underscores "_". This should happen to:
- Proper Nouns:
All noun sequences in capital letters, names of geographical
areas, institutions, people, abbreviations, etc. For example, Howard_Dean,
middle_east, West_Bank,
Gulf_Information_Technology_Exchange, Dubai_Internet_City
The_Bank_of_America, Defense_Minister. Make POS=PN.
- Phrasal verbs:
such as set_out, look_after, cut_off.
A "phrasal verb" (also called "particle verb") is a verb with
something that looks like a preposition, but isn't, because it can come
before OR after the object: throw out the garbage and throw the
garbage out. YOU MUST DO THIS TEST IN YOUR HEAD TO
DETERMINE IF THE VERB IS A PARTICLE VERB. If the verb is
intransitive and has this kind of particle, it is clearly a particle
verb. If the verb only allows V+P+N order, then it is not a
particle verb, the word is in fact a preposition (and not a
particle), and you do not combine the preposition with the verb. An example of a verb with a preposition that do not constitute a particle verb is run into the room since we can't say the same thing as *run the room into.
- Restructure
Corrent parse errors of
all kinds. Here are some examples of things that you will
probably need to correct, but this list is not exhaustive.
Copular constructions
- Identify constructions where
there is an explicit or implicit verb to be.
- Identify the subject and the
predicate of the verb to be. Predicates come after the
verb to be and qualify subjects, which come before. In
the following examples, subjects are red and predicates are blue: John is a
nice man; John will be tall; John caused Mary to
be happy; John made Mary
(be) happy; Mary considered John (to be) an honest man.
- Delete the verb to be if
present. Make the subject a child of the head of the predicate. For
example, in John is a nice man, John is a child of man. The
relation of the subject to the predicate is subject. The
relation of the predicate to its parent can be Subj, Obj,
Root or Mod. Connexor might produce the relation
Pred. This is illegal and it should be converted
to Mod.
- Root Selection
There
should be one dummy root at the top which has the whole sentence in it.
Under that dummy root, there should be one real root. Only One! even if
the parser returns many. So, one node must be selected as real root and
all other nodes go under it.
- Multiple adjectives
can appear in a left-branching or
right-branching chain. Make sure to make them all sisters IF each
adjective can be used by itself with the noun it modifies.
- Special Constructions
Right branching of all elements.
- In journalistic source
identification, such as Dubai 10-28 (FP)- or
Baghdad, October 28th, Reuters -.
- Change co-referentials
words that appear as <pro> should be changed to refer
to the noun. For example: John wanted <pro> to go
--> John wanted <John> go (to is
deleted from the zero-meaning rule). Change in both Word and Lexeme
features.
- Relative construction
such as The man who Mary saw is here or The
man Mary saw is here.
- Depassivization
Add a dummy subject <pro>
if there is no by-agent
(if there is an by-agent, then remove the by
and make the complement of the by the new deep subject) and
make the surface subject an object or indirect object. For example John
was killed ==> *pro*(Subj) killed John(Obj)
or Mary was given a gift ==> <pro>(Subj)
given Mary(Obj2) gift(Obj)
- Add Arguments
All verbs must have subjects (and
possibly objects and indirect objects) no matter if they are main verbs
or modifying verbs. Some adjectives can have a participle form such as engaged
or daunting; these should be turned into verbs and treated as
such. Non-predicative adjectives do not require subjects.
- Add co-references
for control verbs: for example, the new law allows companies to
own property, charge fares, and hire employees -->
the new law allows companies to own property, <companies>(Subj)
charge fares(Obj), and <companies>(Subj) hire employees(Obj)
- Add Features for SyntRole
- This is the deep-syntactic role, not
the surface-syntactic role! Every single verb MUST HAVE A SUBJ
DEPENDENT!
- Correct
- Bad Spelling
such as It would he
good to launch the company in October. - Bad POS
such as
- proper nouns should have POS
PN
- anything with a numeral is Num:
30, $200, etc.
- Bad Relations
such as Launching(Mod of caused) the company caused
many problems -> Launching(Subj of caused) the company caused many
problems