LBSC708A Test Notes What is this class about? A shared view of the most important points. o Ranked Retrieval / Retrieval Models (Vector Space, Probabilistic, Boolean) o Feature / Term Extraction / Segmentation / Stemming o Evaluation Techniques / Recall and Precision / Desiderata o System Models o Machine Learning (Indirectly) o Bag of Words / Terms o Indexing o Tradeoffs o User Interface (QF, SI, EX) o Summarization o Social Filtering All cross-referenced with: o Cross Language o Speech o Video o Document Images Sample exam questions: (45 minutes) Recent estimates suggest that the size of the World Wide Web is now doubling about every five months. Explain what effect this rapid growth will have on the ability of search engines to give rapid, complete, and accurate responses to user queries. (3 minutes) How many angels can fit on the head of a pin? (20 minutes) One criticism on the "bag of words" approaches to information retrieval is that they don't capture evidence of meaning that comes from the relationships among terms. Explain the meaning of "term" in the context of information retrieval and explain how an appropriate choice of terms can help to overcome this problem. (25 minutes) How could collaborative filtering be used in cross-language retrieval (If speakers of both languages are available to provide user assessments). (15 minutes) Discuss some of the ways IR systems process non-Latin scripts and how this differs from processing of Latin characters. (15 minutes) Identify the parts of the information retrieval model and describe the relationships between them. (20 minutes) Evaluate three methods of segmentation in text-based IR. Which would be the best for use with multimedia retrieval, such as video? (20 minutes) Describe the advantages and disadvantages of each retrieval model we have studied.