Protein Name Tagging for PRONTO

Inderjeet Mani

Georgetown University


UMIACS Computational Linguistics Colloquium

February 18, 2004, 11:00am, AVW Room 2120


Ontologies for biology are crucial in data integration from multiple data bases and in literature mining for knowledge extraction and evidence attribution Ontology development however currently requires substantial knowledge acquisition and human effort We introduce an NSF-supported research project at Georgetown University that is using NLP tools to semi-automatically induce an ontology of protein names (PRONTO) The protein names in PRONTO are discovered by an automatic tagger that uses a combination of statistical classifiers trained on annotated MEDLINE abstracts We describe the first set of annotation guidelines and inter-annotator reliability analyses for protein name tagging along with comparisons of the taggers performance with recent reported results We also describe the first steps towards induction of  PRONTO which is linked to both the Gene Ontology and to a comprehensive database of information on proteins (UNIPROT).

 


 About the Speaker:

 

Inderjeet Mani is a Computational Linguist, one of several thousand people who study human languages from a computational perspective. Computers today can search for information, organize it, and gather facts from large quantities of data. They can also answer simple factoid questions, and produce summaries that are extracts from the original text. However, to go beyond these capabilities in processing human languages, they need to understand what we mean. My focus is therefore on trying to get computers to do a better job of understanding meaning. My research activities are in two specific areas.

Additional information is available at http://complingone.georgetown.edu/%7Elinguist/inderjeet.html

 

 For the colloquium series schedule, see the UMD Computational <http://www.umiacs.umd.edu/research/CLIP/colloq/>. If you are interested in meeting with the speaker, please contact Doug <http://www.glue.umd.edu/~oard/>  Oard (oard@umiacs.umd.edu <mailto:oard@umiacs.umd.edu> ).