| CMSC858E: Algorithms for Biosequence Analysis [ Fall 2005 ] |
 |
 |
Essential Details
Time: Tuesday & Thursday, 11:00-12:15.
Location: CSIC 2118.
Staff: Nathan Edwards, Biomolecular Sciences Building 3119, nedwards @ umiacs.umd.edu
Office Hours: Thursday, 12:15-1:30.
Description
Introduction to the algorithms of sequence analysis for
bioinformatics. Bioinformatics tools are evaluated empirically, by how
well they solve specific problems in genetics, molecular biology and
medicine. Effective tools are the result of good modeling decisions
that balance theory and pragmatism. The course will study the core
algorithmic ideas and modeling trade-offs used in the tools that are
changing the way biology is studied. A particular emphasis will be
placed on problems with significant combinatorial structure. The
course will be organized around real problems with a common inference
technique: sequence alignment, sequence motifs, population genetics,
high-throughput experiment design, and inverse problems.
Prerequisites
The course is intended for graduate students in computer science and
related fields. The course will assume no background in genetics or
molecular biology, but will assume a solid background in algorithms
and data-structures.
Course Topics
This list of course topics is tentative and may change as the semester
progresses subject to student interest and time constraints.
- Molecular Biology Primer
- Sequence Alignment
- Exact String Matching: Boyer-Moore, Knuth-Morris-Pratt, Aho-Corasick, Keyword-Trees, Suffix-Trees, Suffix-Arrays, Burrows-Wheeler Transform.
Inexact String Matching: Pairwise Alignment, Heuristic Speedup, Alignment Statistics.
Applications: BLAST, Comparative Genomics.
- Sequence Motifs
- Markov Chains and Hidden Markov Models: Forward, Backwards and Viterbi Algorithms, Baum-Welch, Expectation-Maximization.
Applications: Motif Finding, Gene Finding, Protein Classification.
- Population Genetics
- Haplotype Phasing: Parsimony, Pure Phylogeny.
SNP Selection: Block-free Tagging SNPs.
Applications: Genotyping, Disease Association, Linkage Analysis.
- High-throughput Experiment Design
- Sequence Uniqueness: Hashing, Mer-Counting, Sequence Compression.
Multiplexing: Well-Assignment Problem.
Applications: Genotyping.
- Inverse Problems
- Proteomics: Peptide Identification, Quantitation.
Genome Assembly: Overlaps, Mate-Pairs, Sequencing-By-Hybridization, Eulerian-Path Assembly.
Textbooks
There are no required textbooks, but students will be assigned reading
from books, papers, and online resources. Some recommended books:
- Gusfield. Algorithms on strings, trees, and sequences
- String matching for computational biology bible.
- Durbin, Eddy, Krogh, and Mitchison. Biological sequence analysis
- Hidden markov models for computational biology bible.
- Jones and Pevzner. An introduction to bioinformatics algorithms
- Introduction to both algorithms and computational biology.
- Pevzner. Computational Molecular Biology
- Advanced algorithms for computational biology.
Coursework and Grading
Students will be required, periodically, to complete homework
assignments using publicly available bioinformatics tools.
Students will be required to implement a "classic" bioinformatics or
computational biology algorithm and apply it to publicly available
data in order to investigate a specific biological
hypothesis. Students are encouraged to come up with their own
proposals in consultation with the instructor; potential projects will
be identified as the course progresses. Students will be required to
present details of the algorithm, their implementation, and their
insights towards the end of the semester.
Final grades will be based on homework (10%), project (40%), and
mid-term (20%) and final (30%) exams.
|