WWW 2008 / Poster Paper April 21-25, 2008 · Beijing, China Collaborative Knowledge Semantic Graph Image Search Jyh-Ren Shieh1 1 Yang-Ting Yeh1 Chih-Hung Lin1 2 Ching-Yung Lin2 Ja-Ling Wu1 Dept. of Computer Science and Information Engineering National Taiwan University Taipei 106, Taiwan IBM T. J. Watson Research Center Hawthorne, NY 10532, USA {jerry, dy, meconin, wjl}@cmlab.csie.ntu.edu.tw ABSTRACT In this paper, we propose a Collaborative Knowledge Semantic Graphs Image Search (CKSGIS) system. It provides a novel way to conduct image search by utilizing the collaborative nature in Wikipedia and by performing network analysis to form semantic graphs for search-term expansion. The collaborative article editing process used by Wikipedia's contributors is formalized as bipartite graphs that are folded into networks between terms. When a user types in a search term, CKSGIS automatically retrieves an interactive semantic graph of related terms that allow users to easily find related images not limited to a specific search term. Interactive semantic graph then serve as an interface to retrieve images through existing commercial search engines. This method significantly saves users' time by avoiding multiple search keywords that are usually required in generic search engines. It benefits both naïve users who do not possess a large vocabulary and professionals who look for images on a regular basis. In our experiments, 85% of the participants favored CKSGIS system rather than commercial search engines. chingyung@us.ibm.com knowledge of human beings on Wikipedia to find out semantic graphs of search terms. We propose to use an important hidden linkage of terms inside human (i.e., article editors) to consider the relatedness of terms as well as how close terms are. Our hypothesis is that whoever contributed to two articles, there is a large likelihood that these two terms are somehow related so that an `expert' knows both of them. We thus draw bipartite graph between authors and terms and collapses it into one-dimensional graphs with weighted links. Since relationships somewhat captured human being knowledge behind scenes, the constructed semantic graphs look useful and can serve as an important basis for query expansion on image searches. The rest of this paper is organized as follows. We first present the framework of the CKSGIS in Section 2. In Section 3, we describe the image search interface. Experiment results of the new system are reported in Section 4. Finally, conclusions and direction of future work are provided in Section 5. Construction of Semantic Relatedness concept network analysis from internal link social network analysis from contributors Wikipedia Term 1 Term 2 Term 3 ...... Term N Semantic Relatedness Weighting Categories and Subject Descriptors J.4 [Social and Behavioral Sciences]; H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval General Terms Algorithms, Human Factors, Experimentation Image Search Interface keyword phrase Image Query Concept Interpreter Semantic Relatedness Graph Plot Keywords Social Network, Keyword Expansion, Re-ranking Important Keyword Ranking 1. INTRODUCTION As of December 2007, Google Images is providing keywordbased queries to search more than one billion images on the Internet. While most commercial image search engines mainly rely on hyperlinks and metadata to index images [1], researchers have been investigating the concept detection approach in order to understand image content directly and annotate keyword terms to images [2]. Keyword-based query is more natural to general users. However, it shares the same obstacles as in information retrieval ­ what keywords should I enter in the search box to find out the information I really need? Users experienced finding the right term. Content-based image search tried to circumvent this problem. However, many terms are difficult to portray visually. In most occasions, we do not know what to `ask' and thus spend lots of time to try various search terms again and again until we find the things we want. With this trend in mind, we propose to utilize the collaborative Copyright is held by the author/owner(s). WWW 2008, April 21­25, 2008, Beijing, China. ACM 978-1-60558-085-2/08/04. Image Display Image Display and Recommendation Interface Figure 1. The framework of Collaborative Knowledge Semantic Graph Image Search System 2. THE FRAMEWORK OF CKSGIS The framework has three major components. In the first module, we use Wikipedia's concept and social network to compute the semantic distance between relevant terms. In the second module, we use the "Image Query Concept Interpreter" based on NLP approaches to extract concept terms from keyword or phrase input. Using the semantic weights resulting from the first module, we plot out a semantic relatedness graph according to image query. We also rank the importance of each correlated keyword. In the third module, we utilize the information gathered both from "Semantic Relatedness Graph Plot Out" and "Important Keyword Ranking" to display the image based on each query and make a recommendation according to the obtained relatedness and trends. 1055 WWW 2008 / Poster Paper April 21-25, 2008 · Beijing, China relative the keyword. Its iterative stationary probability equations , where is 1 | p(v|u) denotes the transaction probability from the node u to node v, pv denotes the prior probability of node v and denotes the back probability. 3. IMAGE SEARCH INTERFACE When a user inputs a keyword, a semantic relatedness graph built around the keyword will plot out on the left hand side of the interface. We represent the related terms with nodes and their relationship between terms as edges. The edge width indicates the relatedness weight between two terms i.e. a thicker edge represents a closer semantic relatedness. We also created a scroll bar to adapt the threshold of relatedness weight between two nodes. Meanwhile, if we move the cursor onto one of the keywords on the semantic relatedness graph, the dynamic ranking image search result from external image search engine is shown on the right-hand side. Figure 2 shows an example based on the keyword "solar energy." In real time, CKSGIS parses query strings and checks whether the semantic graph of the query term has been previously stored. If not, then the system checks whether that term exists as a new entry in Wikipedia and thus generates semantic graph in real time by linking that term to other known terms in the database. 4. EXPERIMENTS A controlled user study based on 30 students show that more than 80% of the participants preferred our semantic relatedness results to those provided by traditional image search interface. 87% said that the obtained relatedness helps them to learn more about the collection. 73% found CKSGIS is more flexible, and 73% found that, it is easier to use. These results indicate that our semantic graph based approach is a promising method for users to use in their daily life search tasks. 5. CONCLUSIONS AND FUTURE WORKS To the best of our knowledge, we are the first team to explicitly use Wikipedia's social network to compute semantic relatedness and to use semantic graphs for image search. In summary, CKSGIS provides a new method of performing image query with high dimensional semantic relatedness. The proposed semantic graphs can be used not only for image searches but also for generic information search. Our early experiments demonstrate the usefulness of the generated semantic graphs. We see a potential advantage in using such semantic graphs for various search tasks in people's daily search activities. In the future, we would like to create semantic graphs for use in generic searches and directly fuse the ranked semantic keyword list with search results. We are also studying the effects of adding personalization and trend prediction to the design. 3.1 User-Driven Query Expansions We incorporate a user-driven query expansion function. In order to learn more about a topic or get an answer to an open-ended question, users can select a group of keywords, either by clicking or circling the desired cluster terms. After the selection has been made, a query is then launched based on the selected keyword and the user will be presented with a number of images provided by the search engine. Using user-driven query expansion, we help users search images in a focused and efficient manner. Table 1. PRankP result of semantic relatedness graph centered by search term "solar energy." Rank 1 2 3 4 5 Keywords Solar energy Renewable energy Photovoltaics Tidal power Wind power Rank Score 0.379025 0.102116 0.074272 0.059877 0.057685 6. REFERENCES [1] H. Liu, X. Xie, X. Tang, Z.-W. Li, W.-Y. Ma, Effective Browsing of Web Image Search Results, In Proc. of the 6th ACM SIGMM Workshop on Multimedia Information Retrieval (New York, NY, USA, 2004), pp. 84-90. [2] M. Naphade, John R. Smith, Jelena Tesic, Shih-Fu Chang, Winston Hsu, Lyndon Kennedy, Alexander Hauptmann, Jon Curtis, Large-Scale Concept Ontology for Multimedia, IEEE Multimedia Magazine, vol. 13, No.3, July-September, 2006. 3.2 Search Term Recommendations We also incorporated an interface to list recommended search terms by condensing a semantic graph. In order to rank the semantic-related terms from a keyword, we use PageRank with Priors [3] to rank the importance of nodes (terms) in a graph [3] S. White, P. Smyth, Algorithms for estimating relative importance in networks, ACM SIGKDD 2003, pp. 266-275. Figure 2. In the left hand side, users see a semantic graph with its center node as the search keyword: solar energy. Whenever the user mouse over a node, its Google Image search results are displayed on the right. The radii of terms from center represent how many semantic degrees away from the search term. The thickness of links represents the weights of relatedness. The importance of each correlated terms relative to the keyword also be ranked on the upper-right. 1056