|
CMSC 498T SPECIAL PROBLEMS IN COMPUTER SCIENCE - Spring 2006
***NEW*** Course Prereq have changed; only CMSC 420 is a pre-req ***NEW***
Louiqa Raschid
Smith School of Business and UMIACS and the Center for
Bioinformatics and Computational Biology
Data Management for the Biological Enterprise
|
| Home |
Links |
Syllabus |
Schedule |
Projects |
Readings
|
TuTh 12:30-1:45 CSI Room 1122
|
Class description
The e-biology revolution has resulted in an explosion of complex
data for the biological enterprise. New technologies result in
high data production rates. Biological data sources are numerous
and exhibit a diversity of format and access structures. They also
support a diversity of search and computational capabilities.
A wide interest in bioinformatics was sparked by the human genome
project. The last few years has seen emerging activity in the
areas of database, data mining, machine learning, and information
retrieval for life science applications. Thus, a biological
scientist or a computer scientist with an interest in bioinformatics
or computational biology has to be familiar with the challenges
of biological data management.
The NIH has recently launched many National Centers for BioMedical
Computing. The
National Center for Integrative Biomedical Informatics (NCIBI)
and
National Center for Biomedical Ontology
both address issues in biological data management. This course
will explore some of the research challenges addressed by these Centers.
This course will explore a range of issues that impact data
management and database integration in the life science domain.
This includes the following:
- Data models and data representation.
- Query languages and query evaluation.
- Architectures and protocols for database integration.
- Syntactic and semantic impediments to database integration.
- Semantic Web for life sciences.
This course has five objectives.
-
First, we introduce the student to the basics of database management
technology, and the basics of genomics. This includes the ER and
relational data model and the SQL programming language as well as
object-oriented concepts and semi-structured data (XML).
Basic ideas from genomics/molecular biology will also be introduced.
-
Second, we consider one or two applications including the development
of a model organism database or constructing a clinical data repository,
and step through the database design lifecycle.
-
Third, we cover a variety of architectures and solutions that have been
utilized for data integration. This includes: scripts that access data in
XML or ASN format; datawarehouses; multi-DBMS and mediation technology;
and SOAP and UDDI based middleware.
-
Fourth, we explore a myriad of publicly accessible data sources, portals
and repositories, e.g., NCBI Entrez, PDB, UniProt, etc. and consider their
contents, and search and computational capabilities.
-
The final element is a team based project where students will utilize
their knowledge of data models, query languages and integration
architectures to support biological discovery through the exploration of
multiple Web accessible datasources.
This course is targeted at computer science/computer engineering
seniors and juniors with a strong interest in the life sciences or
life science seniors and juniors with a strong computational background.
The pre-requisite is CMSC420.
If you are a life science major and you are taking CMSC courses you
can contact Professor Raschid for more information.
Class format