With the double-exponential growth of the peer-reviewed literature, the amount of information relevant to an experimental biologist is exploring and thus making it harder than ever for an individual to find and assimilate all of the relevant knowledge from the literature. The National Library of Medicine (NLM) has responded to this particular need by providing handcrafted Gene References Into Function (GeneRIF), a short description derived from the primary literature about the function of a gene. In this talk, I will describe an end-to-end system for text mining from GeneRIFs that ensures our ability to find a sufficient number of textual inputs, to locate the genes of interest within them, to map them to the appropriate Entrez Gene entries, and to characterize the appropriate relations between them in the protein transport domain.
Zhiyong Lu holds a Ph.D. in Bioinformatics from the University of Colorado Health Sciences Center. He also obtained a BS from Nanjing University in 2001 and a MS from the University of Alberta in 2003, both in Computer Science. His research area is the intersection of computer science, biology, and computational linguistics. More specially, he has focused on the problems of helping bench scientists find the specific publications that are relevant to their work, and having found those documents, then making that (sometimes very large) body of text manageable for them. He joined the National Center for Biotechnology Information (NCBI) as a Staff Scientist in 2007.
This talk is part of the CLIP Colloquium Series, organized by Jimmy Lin (jimmylin -at- umd .dot. edu). For the complete schedule, please visit http://www.umiacs.umd.edu/research/CLIP/colloq/.