================================================================= LBSC 796/INFM 718R Fall 2007 Study Guide ================================================================= o Structure of IR Systems - IR process model - System architecture - Information needs: Visceral, Conscious, Formalized, Compromised - Utility vs. relevance - Known item vs. ad hoc search o Evidence from Content and Ranked Retrieval - Inverted indexing (time and space complexity, postings file) - Bag of terms (segmentation, phrases, stemming, morphology, stopwords) - Boolean retrieval - Proximity operators - Term weights (term frequency, IDF, length normalization, Okapi) - Vector space ranked retrieval (TF*IDF, cosine normalization, Okapi BM25) - Probabilistic retrieval (language models) - Passage retrieval - Blind relevance feedback Interaction - Query by example vs. query formulation - Summarization (indicitive vs. informative) - Clustering - Visualization (projection, starfield, contour maps) o Evaluation - Criteria (effectiveness, efficiency, usability) - Measures of effectiveness (recall, precision, F, MAP, averages) - Pooled relevance assessment - Statistical significance - User studies o Web Search - Crawling - PageRank - Deep Web (i.e., database-generated content) - Semantic Web o Evidence form Behavior - Implicit feedback - Privacy risks - Recommender systems o Document image retrieval - Layout analysis - OCR and shape codes - Error-tolerant matching (character n-grams, byte length normalization) - Handwriting recognition o Evidence form Metadata - Standards (e.g., Dublin Core) - Controlled vocabulary - Text classification - Information extraction o Filtering - Profile indexing - Profile learning (Rocchio) - Implicit feedback (links, time) o Audio retrieval - Speech (words vs. phone n-grams, speaker segmentation, interface design) - Music (contour matching) o Cross-language retrieval - Interactive query formulation - Language identification - Dictionary-based techniques - Corpus-based techniques (parallel corpora, comparable corpora) - Evaluation - Interactive selection o Image retrieval - Color histograms - Texture matching - Image segmentation (relative position indexing) o Video retrieval - Motion detection (camera, object) - Shot structure (boundary detection, classification) - Video OCR (closed caption, on screen caption, scene text) - Interface design (key frames, storyboard, slide show, salient stills) --------------------------- End ------------------------