ENEE759G: Data Mining and Knowledge Discovery   

Instructor: Joseph JaJa

Fall 2005 Course Syllabus              

Course Objectives: The course will cover fundamental techniques used for analyzing and classifying large scale scientific and business data. These techniques, primarily based on machine learning and statistical methodologies, will include: statistical models and patterns, supervised learning, Bayesian and neural networks, support vector machines, search and optimization, finding patterns and rules, anomaly detection, and content based retrieval.

Course prerequisites: Graduate standing

Prerequisite topics: Basic algorithms and optimization techniques, and a good background in statistics. A background in nonlinear optimization is desirable.

Textbook: Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Pearson, Addison Wesley, 2006.

References:

  1. Machine Learning, Tom Mitchell, McGraw-Hill, 1997.
  2. Principles of Data Mining, D. Hand, H. Mannila, and P. Smyth, MIT Press, 2001.
  3. An Introduction to Support Vector Machines, Nello Cristianini and John Shawe-Taylor, Cambridge University Press, 2000.
  4. Advances in Knowledge Discovery and Data Mining, U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy, MIT Press, 1996.

Core Topics:

1. Data Preprocessing and Exploration (Chapters 2 and 3)

2. Fundamental Classification Strategies (Chapters 4 and 5)

3. Clustering Techniques (Chapter 8)

4. Mining for Rules (Chapter 6)

6. Anomaly Detection (Chapter 10)

Course Grade: Midterm (30%); Final (30%); Project (40%)

Project

Each student is expected to define a project that involves two of the general techniques discussed in class and explain in some details their performances on at least three significant data sets. A proposal explaining the project and the experimental work to be carried out is due on September 29, 2005.

Information about software and data sets related to all the topics covered in class can be found at the textbook web site: www-users.cs.umn.edu/~kumar/dmbook

Each student is supposed to make a presentation about her project during the last two weeks of the class. Final reports are due December 8.

Contact Information: joseph@umiacs.umd.edu; 301-405-1925.

Office: 3433 A.V. Williams Bldg; Office Hours: Monday, Wednesday 3-4:30

Midterm: Tuesday, October 18; Final: Wednesday, Dec. 21, 10:30-12:30 (may be rescheduled)