CMSC 498T - Bio Data Mgmt

CMSC 498T SPECIAL PROBLEMS IN COMPUTER SCIENCE - Spring 2006

***NEW*** Course Prereq have changed; only CMSC 420 is a pre-req ***NEW***

Louiqa Raschid

Smith School of Business and UMIACS and the Center for Bioinformatics and Computational Biology

Data Management for the Biological Enterprise

Home

Readings

TuTh 12:30-1:45 CSI Room 1122

Class description

The e-biology revolution has resulted in an explosion of complex data for the biological enterprise. New technologies result in high data production rates. Biological data sources are numerous and exhibit a diversity of format and access structures. They also support a diversity of search and computational capabilities. A wide interest in bioinformatics was sparked by the human genome project. The last few years has seen emerging activity in the areas of database, data mining, machine learning, and information retrieval for life science applications. Thus, a biological scientist or a computer scientist with an interest in bioinformatics or computational biology has to be familiar with the challenges of biological data management.

The NIH has recently launched many National Centers for BioMedical Computing. The National Center for Integrative Biomedical Informatics (NCIBI) and National Center for Biomedical Ontology both address issues in biological data management. This course will explore some of the research challenges addressed by these Centers.

This course will explore a range of issues that impact data management and database integration in the life science domain. This includes the following:

Data models and data representation.
Query languages and query evaluation.
Architectures and protocols for database integration.
Syntactic and semantic impediments to database integration.
Semantic Web for life sciences.

This course has five objectives.

First, we introduce the student to the basics of database management technology, and the basics of genomics. This includes the ER and relational data model and the SQL programming language as well as object-oriented concepts and semi-structured data (XML). Basic ideas from genomics/molecular biology will also be introduced.
Second, we consider one or two applications including the development of a model organism database or constructing a clinical data repository, and step through the database design lifecycle.
Third, we cover a variety of architectures and solutions that have been utilized for data integration. This includes: scripts that access data in XML or ASN format; datawarehouses; multi-DBMS and mediation technology; and SOAP and UDDI based middleware.
Fourth, we explore a myriad of publicly accessible data sources, portals and repositories, e.g., NCBI Entrez, PDB, UniProt, etc. and consider their contents, and search and computational capabilities.
The final element is a team based project where students will utilize their knowledge of data models, query languages and integration architectures to support biological discovery through the exploration of multiple Web accessible datasources.

This course is targeted at computer science/computer engineering seniors and juniors with a strong interest in the life sciences or life science seniors and juniors with a strong computational background. The pre-requisite is CMSC420. If you are a life science major and you are taking CMSC courses you can contact Professor Raschid for more information.

TuTh 12:30-1:45 CSI Room 1122

Class description

Class format