================================================================= LBSC 796/INFM 718R Spring 2011 Study Guide ================================================================= o Structure of IR Systems - IR process model - System architecture - Information needs: Visceral, Conscious, Formalized, Compromised - Utility vs. relevance - Known item vs. ad hoc search o Evidence from Content and Ranked Retrieval - Inverted indexing (postings file) - Bag of terms (segmentation, phrases, stemming, stopwords) - Boolean retrieval - Vector space ranked retrieval (TF, IDF, length normalization, BM25 ) - Probabilistic ranked retrieval (language models) - Blind relevance feedback Interaction - Query formulation vs. Query by example - Summarization (indicitive vs. informative) - Clustering - Visualization (projection, starfield, contour maps) o Evaluation - Criteria (effectiveness, efficiency, usability) - Measures of effectiveness (recall, precision, F, MAP, NDCG) - Pooled relevance assessment - Statistical significance - User studies o Web Search - Crawling - PageRank - Anchor text - Deep Web (i.e., database-generated content) o Evidence form Behavior - Implicit feedback - Privacy risks - Recommender systems o Document image retrieval - Layout analysis - OCR and shape codes - Error-tolerant matching (e.g., character n-grams) - Handwriting recognition o Evidence form Metadata - Standards (e.g., Dublin Core) - Controlled vocabulary - Text classification - Information extraction o Filtering - Profile learning (Rocchio) o Audio retrieval - Speech (words vs. phone n-grams, interface design) - Music (contour matching) o Cross-language retrieval - Dictionary-based techniques - Corpus-based techniques (parallel corpora, comparable corpora) - Evaluation - Interactive selection o Image retrieval - Color histograms - Texture matching - Image segmentation (relative position indexing) o Video retrieval - Motion detection (camera, object) - Shot structure (boundary detection, classification) - Video OCR (closed caption, on screen caption, scene text) - Interface design (key frames, storyboard, slide show, salient stills) --------------------------- End ------------------------