WWW 2008 / Poster Paper April 21-25, 2008 · Beijing, China Temporal Views Over RDF Data Geetha Manjunath Hewlett-Packard Labs Bangalore, India +91-080-2205-2259 geetham@hp.com R.Badrinath Hewlett-Packard Bangalore, India +91-80-2516-5005 badrinath@hp.com Craig Sayers Hewlett-Packard Labs Palo Alto, US +1-650-857-3838 craig.sayers@hp.com Venugopal K S Hewlett-Packard Bangalore India +91-80-2516-5005 venuks@hp.com ABSTRACT Supporting fast access to large RDF stores has been one of key challenges for enabling use of the Semantic Web in real-life applications, more so in sensor-based systems where large amounts of historic data needs to be stored. We propose a semantics-based temporal view mechanism that enables faster access to time-varying data by caching into memory only the required subset of RDF triples. We describe our experience of implementing such a framework in the context of a wide area network monitoring system. Our preliminary results show that our solution significantly improves client access time and scales well for moderate data sets. While view materialization is well understood for traditional relational databases, it remains an active research for XML and RDF stores. Further, using current methodologies[2,3,5], operations on custom snapshots over time-varying data not only requires a sequence of queries (using SQL or SPARQL) but also extensive program logic to update the snapshot to track data changes over time. We introduce a simple method of generating and operating on snapshots through a mechanism that we call as temporal views. Here, an application specifies a live snapshot of interest in a declarative fashion and then makes only simple queries over this snapshot. It is naturally, the responsibility of the runtime system to update the snapshot when data changes. In this paper, we propose a temporal view language to enable a semantic description of application-relevant data subsets. We briefly describe the design and implementation of our runtime framework that supports live temporal views over RDF data. We also present the preliminary results showing improved query efficiency through snapshot generation using data from a wide area network monitoring system [1] over the PlanetLab [6]. Categories and Subject Descriptors D.2.11 [Software Architectures] D.2.13 [Reusable Software] General Terms Languages, Performance, Experimentation, Design Keywords Semantic Web, views, time varying data, sensors, RDF, OWL, networking, data, Jena, Joseki, SPARQL, ontology, snapshots 2. OUR SOLUTION In order to keep the business applications less sensitive to schema changes in sensor data and to support inference over raw data, we decided to store the sensor data as RDF triples. Over this RDF store is the core part of our solution, a view mechanism that maintains relevant subsets of base data for every application. The client applications first declaratively specify the data subset of interest and then use SPARQL queries to retrieve required information from the materialized view (also an RDF model). The view generator uses optimized SPARQL queries over the base data to maintain liveness of the view model. Our new view specification language based on OWL brings in the power and expressivity of ontology languages for view specification. Using this language, an application can define a view using the semantics of the domain and temporal properties of the data. Every view specification includes: a) Part of the schema of interest (the "TBox" in description logic) b) Instances of data of interest (the "Abox" in description logic) c) The temporal aspects of the snapshot of interest. A temporal class defined in the specification gives the temporal aspect of the selected subset of data (say, latest n events only or events collected in the last m minutes) which is based on a given time property (a view:Time). The rest of the specification, described in [4], selects what else comes with these instances (e.g. while accessing network routes, do you want access to host information also?). Now, for every live snapshot required by a new application, a temporal class is defined in its view specification. We are unable to give a complete description of our view language due to space constraints. However, an example view specification is shown in Figure 1 and is described in the context of our test application in section 3. 1. INTRODUCTION Business applications are increasingly using sensor data to provide intelligent services. For example analyzing the performance of a network requires repeatedly measuring roundtrip packet delays to provide timely and accurate knowledge of highly dynamic network properties [1]. We believe an RDF-based store for sensor data would enable the system to be flexible enough to adapt to schema variations of data due to changes in sensor infrastructure as well as enable sophisticated applications that infer newer information using an ontology. Today, as the cost of storage and processing have reduced, one can afford to store large amounts of information for long periods of time, and deploy innovative new applications and services to mine this huge data store. Fast access to such large RDF stores is therefore necessary. A key characteristic of sensor data is that it is time varying in nature, and its temporal aspect is an important facet of the applications processing this data. A typical operation in such applications is to extract temporal snapshots of data. For some applications, only specific snapshots of the data are relevant (say latest events), while some may create live snapshots for efficiency reasons. Thus, snapshot-generation is an important component in sensor-data based applications. This brings forth a need for a simple way of describing and extracting a relevant subset of information (materialized views) over large RDF stores. Copyright is held by the author/owner(s). WWW 2008, April 21­25, 2008, Beijing, China. ACM 978-1-60558-085-2/08/04. 1131 WWW 2008 / Poster Paper Once an application-specific view definition is in place, most of the data processing required by the application can be expressed using SPARQL, the RDF query language ­ thus reducing the application development time. To support temporal views, we have introduced a novel runtime paradigm as well: we define a concept of "notional" equivalence classes that partition the data instances; perform a temporal ordering within each class; and pick instances from each class that meet the specified temporal criteria for the materialized view. April 21-25, 2008 · Beijing, China