Protein Name Tagging for PRONTO
Ontologies
for biology are crucial in data integration from multiple data bases and in
literature mining for knowledge extraction and evidence attribution Ontology development
however currently requires substantial knowledge acquisition and human effort
We introduce an NSF-supported research project at Georgetown University that is
using NLP tools to semi-automatically induce an ontology of protein names (PRONTO)
The protein names in PRONTO are discovered by an automatic tagger that uses a
combination of statistical classifiers trained on annotated MEDLINE abstracts
We describe the first set of annotation guidelines and inter-annotator
reliability analyses for protein name tagging along with comparisons of the
taggers performance with recent reported results We also describe the first
steps towards induction of PRONTO which
is linked to both the Gene Ontology and to a comprehensive database of
information on proteins (UNIPROT).
About the
Speaker:
Additional information is
available at http://complingone.georgetown.edu/%7Elinguist/inderjeet.html
For
the colloquium series schedule, see the UMD Computational <http://www.umiacs.umd.edu/research/CLIP/colloq/>.
If you are interested in meeting with the speaker, please contact Doug <http://www.glue.umd.edu/~oard/>
Oard (oard@umiacs.umd.edu <mailto:oard@umiacs.umd.edu>
).