WWW 2008 / Poster Paper April 21-25, 2008 · Beijing, China Falcons: Searching and Browsing Entities on the Semantic Web Gong Cheng gcheng@seu.edu.cn Weiyi Ge wyge@seu.edu.cn Yuzhong Qu yzqu@seu.edu.cn Institute of Web Science, School of Computer Science and Engineering Southeast University, Nanjing 210096, P.R. China ABSTRACT As of today, the amount of data on the Semantic Web has grown considerably. The services for searching and browsing entities on the Semantic Web are in demand. To provide such services, we develop ed the Falcons system. In this p oster, we present the features of the Falcons system. Categories and Subject Descriptors H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing; H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval General Terms Design, Exp erimentation Figure 1: A screenshot of Falcons Concept Search. Keywords Indexing, Search Engine, Semantic Web, Summarization 1. INTRODUCTION More and more RDF data have b een published on the Semantic Web, and searching for entities (concepts and objects) on the Semantic Web is in demand. To serve it, we develop ed the Falcons1 system. At the time of writing, more than 7 million well-formed RDF documents, containing 250 million RDF statements, have b een discovered by Falcons, and 4,400 ontologies have b een identified among them. Ab out 30 million Semantic Web entities have b een indexed, and ab out 2 million of them are concepts (classes or prop erties). This p oster presents the services provided by the Falcons system and its supp orting technical features. Figure 2: A screenshot of Falcons Ob ject Search. 2. SYSTEM FUNCTIONALITY Falcons provides keyword-based search for Semantic Web entities. In the search results page, for each entity, its typ es and lab els are presented for users to quickly understand its denotation. We also present the numb er of RDF documents where each entity is used, to show its p opularity. We associate each entity with a link to the page listing the RDF documents that define and use it. 1 Semantic Web develop ers need to search existing ontologies for reuse in their data. To serve it, Falcons not only presents the classes and prop erties that match the query terms, but also dynamically recommends ontologies. Once an ontology is selected, as depicted in Fig. 1, the results will b e refined to only include the classes and prop erties in that ontology. Such interaction mode enables users to not only understand sp ecific classes and prop erties but also obtain a general view of ontologies. Semantic Web develop ers and ordinary users also need to search for the ob jects on the Semantic Web. To serve it, Falcons presents the ob jects that match the query terms, and also dynamically recommends several typ es of ob jects that the user is probably searching for. Once a typ e is se- http://iws.seu.edu.cn/services/falcons/. Copyright is held by the author/owner(s). WWW 2008, April 21­25, 2008, Beijing, China. ACM 978-1-60558-085-2/08/04. 1101 WWW 2008 / Poster Paper April 21-25, 2008 · Beijing, China is based on a combination of the TF-IDF technique and the p opularity of ontologies. The ontologies that contain widely instantiated classes and prop erties will b e more likely to b e recommended. We also index each ontology to its classes and prop erties. So, combined with the inverted index from terms to concepts, users can b e served with only those matched concepts in a sp ecific ontology. 3.3 Recommending Classes for Object Search To obtain the candidate classes for recommendation, it is impractical to directly build an index from terms to classes b ecause some classes may have too many instances as well as too many terms to b e indexed. Instead, we iterate over the search results on the fly to collect the classes of the resulting ob jects stored in the index. Then, these classes are ranked based on the coverage of their instances in the results. The top-ranked classes are selected, dynamically group ed by their names, and recommended to users. We also index each class to its instances. So, combined with the inverted index from terms to ob jects, users can b e served with only those matched ob jects of a sp ecific typ e. At the time of writing, the ob ject search service has just b een enhanced to allow users to navigate class hierarchies for query restriction. This new feature is enabled by a way of class subsumption reasoning on multiple vocabularies and an improved technique for recommending classes. Figure 3: A screenshot of Falcons Entity Summary. lected, as depicted in Fig. 2, the results will b e refined to only include the matched ob jects of that typ e. So it can help users quickly find the desired ob jects. For each entity, to help users quickly understand it, Falcons extracts a set of RDF statements ab out it from various data sources on the Semantic Web and organizes them into a summary. As depicted in Fig. 3, these statements are not simply listed, but are clustered by different ontologies such as FOAF and Dublin Core (DC) according to their predicates. Users can switch b etween the tabs to browse the statements describ ed by using a sp ecific ontology, which often characterize a sp ecific asp ect of an entity. In addition, each RDF statement is associated with its source. 3.4 Summarizing Entities for Browsing RDF statements from different data sources are stored and indexed. For each entity, a set of statements ab out it is extracted and ranked according to the p opularity of the entities in these statements. The MMR technique [1] is used to rerank these statements, i.e., the statements are selected into the summary one by one, and once a statement is selected, the ranking values of the remaining statements with a similar predicate (from the same ontology) will b e decreased. Such method can improve the diversity of the summaries. 3. TECHNICAL FEATURES 3.1 Finding Entities by Virtual Documents We use information retrieval (IR) techniques to index entities. For each entity, we index all the terms from its virtual document [2]. The virtual document of an entity consists of its names (local name and lab els), other associated literals, and the names of its neighb oring entities in RDF graphs, decoded from all the RDF documents on the Semantic Web. All these terms b ecome effective esp ecially when the query terms do not mention the names of the desired entities. For the example depicted in Fig. 2, actually the lab els of ontoworld:WWW20082 do not contain the query term "Beijing" but Falcons can still find it b ecause the value of its "has-location-city" prop erty is ontoworld:Beijing. In this way, each term is indexed to more entities than in the traditional methods. So we devise a weighting scheme to ensure that well matched entities, e.g., whose lab els match the query terms, will b e ranked higher. In ranking, the p opularity of entities is also considered. 4. CONCLUSION AND FUTURE WORK Falcons is a keyword-based search system for concepts and ob jects on the Semantic Web, and is equipp ed with entity summarization for browsing. Future work includes improving the manipulation of concept spaces for a b etter user exp erience in searching and browsing the Semantic Web. 5. ACKNOWLEDGMENTS This work is supp orted by the NSFC under Grant 60773106. We are also grateful to Honghan Wu and Xiang Zhang for their work on the system. 6. REFERENCES [1] Carb onell, J. and Goldstein, J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proc. SIGIR, pages 335­336, 1998. [2] Qu, Y., Hu, W., and Cheng, G. Constructing virtual documents for ontology matching. In Proc. WWW, pages 23­31, 2006. 3.2 Recommending Ontologies for Concept Search For each ontology, we index the virtual documents of all its classes and prop erties. So for each query, the candidate ontologies can b e immediately obtained. The recommendation 2 The prefix ontoworld indicates http://ontoworld.org/ wiki/Special:URIResolver/. 1102