================================================================ Fall 2001 study guide ================================================================ o Information needs - Visceral, Conscious, Formalized, Compromised - Utility vs. relevance - Query by example vs. query formulation - Known item vs. ad hoc search o IR process model - System architecture o Text Representation - Controlled vocabulary (supervised learning) - Bag of terms (segmentation, phrases, stemming, morphology, stopwords) - Word sense disambiguation (co-occurrence, effects) - Term weights (term frequency, IDF, length normalization, Okapi) o Indexing - Inverted indexing (time and space complexity, postings file) - Supporting proximity operators o Matching - Boolean (computation, why it works, limitations) - Ranked retrieval paradigm - Vector space (cosine, why it works) - Probabilistic (Bayes theorem, prob. ranking principle, language models) - Passage retrieval - Blind relevance feedback o Selection - Summarization (indicitive vs. informative) - Visualization (projection, starfield, contour maps) o Evaluation - Criteria (effectiveness, efficiency, usability) - Measures of effectiveness (recall, precision, F, MAP, DET, averages) - Pooled relevance assessment (limitations) - User studies - Statistical significance (hypothesis testing) o Filtering - Profile indexing - Profile learning (Rocchio) - Recommender systems - Implicit feedback (links, time) o Question answering - Predictive annotation o Cross-language retrieval - Interactive query formulation - Language identification - Dictionary-based techniques (unbalanced, balanced, structured) - Corpus-based (parallel corpora, comparable corpora) - Evaluation - Interactive selection o Audio retrieval - Speech (words vs. phoneme n-grams, speaker segmentation, interface design) - Music (contour matching) o Image retrieval - Color histograms - Texture matching - Image segmentation (relative position indexing) o Video retrieval - Motion detection (camera, object) - Shot structure (boundary detection, classification) - Video OCR (closed caption, on screen caption, scene text) - Interface design (key frames, storyboard, slide show, salient stills) o Document image retrieval - Skew correction - Layout analysis (text region detection, reading order, captions) - Document type detection (structural cues) - OCR (shape codes) - Error-tolerant matching (character n-grams, byte length normalization) - Handwriting recognition