Digging into Data

Logistics

Location Hornbake 105
Time Mon. 18:00-20:45
Textbook Data Mining with Rattle and R
Webpage http://umiacs.umd.edu/~jbg/teaching/DATA_DIGGING/
Mailing List https://piazza.com/umd/spring2014/inst737/home
Syllabus https://docs.google.com/document/pub?id=1FaGS5tuehBRPHwinE0ZAigB_jYIvkuU-9L6v447UsH4

People

Professor

Jordan Boyd-Graber
Hornbake 2118C
Office Hours (Hornbake 2118C): Monday 16:00 - 17:00 and by appointment

Homeworks

Schedule

Date Subject Assignment Due Lecture
Jan 27 Introduction to Data Science, R, and Rattle [PDF] [Video]
Readings: Downloads:
Feb 3 Probability Crash Course HW0 [PDF] [Video]
Readings:
Feb 10 Properties of Data [PDF] [Video]
Readings:
  • Williams 3-3.5, 5
  • Skim Williams 4 (Know what's there so you can use it as a reference later)
  • (Optional) ggplot2 tutorial
Links to resources I'll be using for demos: Links to sources and outputs so you can try this out yourself In class exercises:
Feb 17 Linear Regression [PDF] [Video]
Readings:
  • Williams 7
  • Chapter 3 of Gelman and Hill (to be handed out in class). Focus on the concepts, not the R commands (unless you want to).
  • (Optional, if you want more detail) Andrew Ng's regression notes
  • (Option for review of normal distribution) The Normal Distribution
In Class:
Feb 24 Classification I: Naive Bayes and Logistic Regression HW1 [PDF] [Video]
Readings: In Class:
Mar 3 NO CLASS: WINTER STORM TITAN!
Mar 10 Classification II: Decision Trees and SVMs HW2 [PDF] [Video]
Readings:
  • Williams 11, 14, 16
In Class:
Mar 17 NO CLASS: SPRING BREAK!
Mar 24 Midterm Review (Optional) [2013]

If you feel you have a good handle on the content, you don't have to come to the review. Please do, however, at least look at the practice problems. If you can complete the problems without any difficulty, then feel free to skip the class (get started thinking about your project!).

Mar 31 Midterm

The midterm will be in-class. There will be 20-50 multiple choice questions and 1-3 short answer questions. The questions will be similar to the in class exercises and review questions. There will be example questions provided before the midterm review.

You can bring a simple calculator (no device with an internet connection, even if it's disabled) and one A4 or US letter sheet of paper with notes (front and back). Your note sheet need not be hand-written, but you should prepare it yourself (it's a useful exercise).

April 7 Evaluating Annotations, Feature Selection, and Feature Engineering Project Proposal Video: [A B C D] [PDF]
Readings: Example:
Apr 14 Unsupervised Clustering HW 3 Video: [A B]
PDF: [A B]
Readings:
Apr 21 Active Learning and Dualist Demo [PDF] [Video]
Readings:
Apr 28 Deep Learning [PDF]
Demo:
May 5 Project Workshop
May 12 Project Presentations Project Report