WWW 2008 / Refereed Track: Semantic / Data Web - Semantic Web I April 21-25, 2008 · Beijing, China Structured Objects in OWL: Representation and Reasoning Boris Motik University of Oxford Oxford, UK Bernardo Cuenca Grau University of Oxford Oxford, UK Ulrike Sattler University of Manchester Manchester, UK ABSTRACT Applications of semantic technologies often require the representation of and reasoning with structured objects --that is, ob jects comp osed of parts connected in complex ways. Although OWL is a general and p owerful language, its class descriptions and axioms cannot b e used to describ e arbitrarily connected structures. An OWL representation of structured ob jects can thus b e underconstrained, which reduces the inferences that can b e drawn and causes p erformance problems in reasoning. To address these problems, we extend OWL with description graphs, which allow for the description of structured ob jects in a simple and precise way. To represent conditional asp ects of the domain, we also allow for SWRLlike rules over description graphs. Based on an observation ab out the nature of structured ob jects, we ensure decidability of our formalism. We also present a hyp ertableau-based decision procedure, which we implemented in the HermiT reasoner. To evaluate its p erformance, we have extracted description graphs from the GALEN and FMA ontologies, classified them successfully, and even detected a modeling error in GALEN. Categories and Subject Descriptors I.2.4 [KR Formalisms and Methods]: OWL General Terms Theory, Languages 1. INTRODUCTION Ontologies are nowadays used in disciplines as diverse as biology [20], medicine[18], astronomy [4], and agriculture [21]. A de facto standard for ontology modeling is the Web Ontology Language (OWL),1 so most ontologies in these domains were either develop ed from the start using OWL or translated into OWL from other formalisms. OWL is an expressive language capable of supp orting diverse applications. Its logical underpinning is given by description logics (DLs), which provide OWL with a clean model-theoretic semantics, well-understood reasoning problems, and p owerful reasoners. 1 In this pap er, we focus on OWL DL--the most expressive of the decidable languages of the OWL family. Copyright is held by the International World Wide Web Conference Committee (IW3C2). Distribution of these papers is limited to classroom use, and personal use by others. WWW 2008, April 21­25, 2008, Beijing, China. ACM 978-1-60558-085-2/08/04. Structured objects --that is, ob jects comp osed of other, p ossibly interrelated ob jects--p ose some well-known problems to OWL and DLs [3, 1, 19]. Such ob jects ab ound, for example, in molecular biology and the clinical sciences. Clinical ontologies such as GALEN [22], the Foundation Model of Anatomy (FMA) [18], the National Cancer Institute (NCI) Thesaurus [7], and SNOMED CT [23] are currently b eing used in large-scale applications, and they describ e numerous structured ob jects. For example, GALEN models the heart as consisting of the left and the right ventricles, the two atria, and the valves, all of which participate in complex relationships, such as "the two ventricles of a heart are separated by the intraventricular septum." OWL can b e used to describ e domains consisting of an arbitrary or even infinite numb er of ob jects, but it only allows for axioms that can connect these ob jects in a certain tree-like manner. In other words, OWL enjoys (a variant of ) the tree model property [24]: if an OWL ontology has a model, then it has a model with a tree-like (or forest-like) relational structure as well. This prop erty is resp onsible for the decidability of OWL reasoning [24]; however, it prevents sufficiently accurate description of complex structured objects, since schema-level axioms in OWL cannot describ e arbitrary relational structures. Consider the previously mentioned diamond-shap ed structure involving a heart, its right and left ventricles, and a septum. In addition to a model that corresp onds to the structure in which the ob jects are connected as exp ected, each schema-level description of the heart in OWL will also have a model where one heart has two septa, each as a part of the left and the right ventricle, resp ectively. Thus, certain consequences of the diamondshap ed structure cannot b e drawn from its formulation in OWL. For example, we cannot conclude that, if the right ventricle has a p erforated septum, the left ventricle has a p erforated septum as well. To address this lack of expressive p ower, in Section 4 we prop ose an extension of OWL for modeling structured objects using description graphs. Such graphs consist of vertices lab eled with atomic concepts and edges lab eled with atomic roles. According to our prop osed model-theoretic semantics, these graphs are class-level statements that sp ecify general patterns of connections b etween ob jects. In addition, we allow for SWRL-like rules [8] to enable the description of conditional statements ab out graphs. Extending DLs with axioms that can enforce arbitrary structures easily leads to undecidability [12]. Our formalism, however, is decidable b ecause it can represent only structured ob jects whose numb er of parts is b ounded. In prac- 555 WWW 2008 / Refereed Track: Semantic / Data Web - Semantic Web I tice, structured ob jects are usually modeled up to a certain level of granularity, which naturally determines this b ound. In Section 5, we present a reasoning algorithm for the case where the OWL part is expressed in S HI Q [10]; it should, however, b e p ossible to extended the algorithm to S HOI Q [9] and hence cover OWL DL. We thus obtain a p owerful, decidable, and practicable language that combines two complementary formalisms: unb ounded but tree-like structures can b e describ ed using standard OWL axioms, and the naturally b ounded structured parts can b e describ ed using arbitrarily connected description graphs and rules. We have implemented our algorithm in the DL reasoner HermiT [16].2 The validation of our approach is currently difficult due to the lack of test data. Thus, we have devised an algorithm that extracts description graphs from OWL ontologies, and have applied it to GALEN and FMA. The resulting ontologies should b e treated with caution; however, domain exp erts have confirmed that substantial parts of the ontologies reflect the actual human anatomy. Our transformation can thus b e a starting p oint for a more comprehensive remodeling using description graphs. Finally, the ontologies are sufficiently complex to allow us to estimate the practicability of reasoning. We present the transformation algorithm in Section 6. In Section 7, we discuss the results obtained by classifying the transformed ontologies. Our transformation allowed us to discover a modeling error in GALEN, which we take as indication that our formalism can indeed b e useful in practice. Furthermore, classification times for the transformed ontologies are of similar orders of magnitude as for the original ontologies despite the fact that our formalism adds considerable expressive p ower to OWL. Due to lack of space, proofs and certain technical details are included in the accompanying technical rep ort [13]. April 21-25, 2008 · Beijing, China arity one, and roles and have arity two. An atom over has the form P (x1 , . . . , xn ), where P is a predicate of arity n and xi are variables. Atoms with the equality predicate are written as t1 t2 . A rule r is an expression of the form B1 . . . Bn H1 . . . Hm (1) where n 0, m 0, and Bi and Hj are atoms. The set of atoms {B1 , . . . , Bn } is called the antecedent, and the set of atoms {H1 , . . . , Hm } is called the consequent. A program P is a finite set of rules. A rule r is interpreted in an interpretation I as a standard first-order implication. 3. MOTIVATION To understand the limitations of modeling structured objects in OWL, let us consider modeling the anatomy of the heart shown in Figure 1(a). This example has b een derived by reconstructing the intention b ehind the axioms describing the heart in GALEN. We next consider p ossibilities for a logical interpretation of the figure. Figure 1(a) could b e represented in OWL using an ABox A. ABox assertions, however, represent concrete data; thus, A would represent the structure of one particular heart. In this pap er, we are concerned with modeling structured objects at the schema level --that is, we want to describ e the general structure of al l hearts. We should b e able to instantiate such a description many times. For example, if we say that each patient has a heart, then, for each concrete patient, we should instantiate a different heart, each of the structure shown in Figure 1(a). This clearly cannot b e achieved if we describ e the structure of the heart using ABox assertions. Consequently, GALEN, SNOMED CT, and NCI contain only schema-level axioms and no ABox assertions. We can give a logical, schema-level interpretation to Figure 1(a) by treating vertices as concepts and arrows as participation constraints sp ecifying relationships b etween concepts. For example, LeftSideOfHeart and AorticValve are concepts and the arrow b etween them states that each left side of the heart has an aortic valve as a structural comp onent. Participation constraints can b e represented using existential quantification, which can b e encoded in OWL using axioms of the form (2). Let K b e a DL knowledge base containing the following axioms. LeftSideOfHeart hasStructuralComponent .AorticValve AorticValve hasAlphaConnection .LeftVentricle LeftSideOfHeart hasSolidDivsion .LeftVentricle (2) (3) (4) 2. PRELIMINARIES A S HI Q signature is a triple = (NC , NR , NI ) consisting of mutually disjoint sets of atomic concepts NC , atomic roles NR , and individuals NI . Let S b e an atomic role, A an atomic concept, and n a nonnegative integer. S HI Q roles and concepts are defined using the following grammar: R S | S- C | | A | ¬C | C1 C2 | C1 C2 | R.C | R.C | n R.C | n R.C A TBox T is a finite set of role inclusions R1 R2 , transitivity axioms Trans(R), and general concept inclusions (GCIs) C1 C2 . An extensional ly reduced ABox A is a finite set of assertions (¬)A(a), S (a, b), a b, and a b. A knowledge base K is a pair (T , A). To ensure decidability of reasoning, certain restrictions apply on the usage of roles in at-most and at-least concepts n R.C and n R.C [10]. An interpretation I is a first-order relational structure. The definition of satisfaction of axioms in I can b e found in [10]. I is a model of K, written I |= K, if it satisfies all axioms of K. The basic inference problem for S HI Q is checking satisfiability of K--that is, checking whether a model of K exists. The principles for extending DLs with rules can b e found in [12, 5, 8]. For a S HI Q signature , let NV b e a countably infinite set of variables disjoint from NI . A predicate is a concept, a role, or the equality predicate . Concepts have 2 Let I b e an interpretation that corresp onds to Figure 1(a) in the obvious way. Clearly, I is a model of K, which justifies the formalization of Figure 1(a) by axioms (2)­(4). Such a schema-level representation of a heart can b e put to use in many ways. We might represent knowledge ab out various heart conditions; for example, if the aortic valve suffers from aortic regurgitation (AR), then the left ventricle suffers from left ventricular hyp ertrophy (LVH): AorticValve HasAR hasAlphaConnection .HasLVH (5) http://web.comlab.ox.ac.uk/oucl/work/boris.motik/HermiT/ We might exp ect to derive from (2)­(5) that, if the aortic valve of the left side of the heart suffers from aortic regur- 556 WWW 2008 / Refereed Track: Semantic / Data Web - Semantic Web I April 21-25, 2008 · Beijing, China (a) Intended Structure (b) Unintended Tree Structure (c) Unintended Infinite Structure Figure 1: Different Models of the Heart in GALEN gitation, then the left ventricle suffers from hyp ertrophy: LeftSideOfHeart hasStructuralComponent .(AorticValve HasAR ) hasSolidDivision .HasLVH (6) rep etitive and much larger than the intended models, which, according to our exp erience, is the main reason why OWL reasoners still cannot process ontologies such as FMA and certain versions of GALEN. To avoid such problems, we need to extend K with additional axioms that make al l models of K corresp ond as much as p ossible to the intended conceptualization shown in Figure 1(a). Such axioms, however, cannot b e stated in OWL, for reasons we explain next. OWL can represent unbounded or even infinite domains, which is appropriate in many cases. For example, in the domain of p eople, we should not make any assumptions ab out the numb er of p eople in the world. In other words, the domain of all p eople does not exhibit a natural bound on its size. Thus, we can represent the fact that every p erson has exactly two parents who are p ersons: Person 2 hasParent .Person 2 hasParent . (9) Reasoning with such axioms is not straightforward. A model containing one p erson must contain two parents 1 and 2 , each of which requires the existence of two additional parents and so on. Effectively, we obtain a model that is similar to the one shown in Figure 1(c). To ensure termination of the model construction outlined in the previous paragraph, the structure of the axioms allowed in OWL is restricted such that the language exhibits (a variant of ) the tree model property [24]: whenever a knowledge base K has a model, it also has a model of a certain tree shap e. The relationship b etween the left side of the heart, the aortic valve, and the left ventricle in Figure 1(a) is, however, triangular and cannot b e represented as a tree. Hence, if we want to ensure that the ventricles whose existence is implied by (3) and (4) are the same in every model of K, we must leave the confines of OWL and DLs. Certain rule formalisms can axiomatize nontree structures. Unfortunately, (6) does not follow from K: axioms (3) and (4) imply the existence of two left ventricles, but no axiom in K states that these two ventricles are necessarily the same ob ject. Thus, an interpretation I corresp onding to Figure 1(b) is also a model of K. In I , even if the aortic valve has aortic regurgitation, the second left ventricle is unaffected. Hence, I |= (6), so K |= (6) as well. The knowledge base K is thus underconstrained: some models of K do not corresp ond to the actual structure of the heart shown in Figure 1(a). This discrepancy can prevent us from drawing some quite reasonable conclusions, such as (6). Furthermore, it can also cause problems with the p erformance of reasoning. For example, we might use axioms (4) and (7)­(8) to describ e the relationships b etween the left side of the heart, the left ventricle, and the mitral valve. LeftVentricle isBetaConnectionOf .MitralValve MitralValve isStructuralComponentOf .LeftSideOfHeart (7) (8) While admitting a model corresp onding to Figure 1(a), these axioms do not state that the mitral valve in (7) is a structural comp onent of the "initial" left side of the heart. Hence, the interpretation from Figure 1(c) is also a model of these axioms. In fact, the latter model is "canonical" in the sense that it reflects the least amount of information derivable from the axioms. In order to disprove an entailment from these axioms, an OWL reasoner will try to construct such a "canonical" model. In practice, such models can b e highly 557 WWW 2008 / Refereed Track: Semantic / Data Web - Semantic Web I For example, the following SWRL [8] rule can b e used to make the two ventricles from Figure 1(b) the same: LeftSideOfHeart (x) hasStructuralComponent (x, y ) (10) hasAlphaConnection (y , z ) LeftVentricle (z ) hasSolidDivsion (x, w) LeftVentricle (w) z w This, however, has significant drawbacks. From the standp oint of modeling, such a solution is quite complex, as it requires the modeler to anticipate which ob jects need to b e made the same. The fact that the two left ventricles are the same follows from the complex interaction b etween axioms (2)­(4) and (10), and is thus not represented explicitly. Clearly, such a modeling formalism is likely to b e hard to use and susceptible to modeling errors. From the standp oint of automated reasoning, the extension of OWL with SWRL is undecidable [8], which is a significant imp ediment to the adoption of SWRL in practice. SWRL-like rules can, however, naturally express certain conditional asp ects of structured ob jects. For example, if the septum has a ventricular septal defect, then there is a blood flow from the left to the right ventricle: IntraventricularSeptum (x) HasVSD (x) hasLayer (y1 , x) LeftVentricle (y1 ) hasLayer (y2 , x) RightVentricle (y2 ) hasBloodFlow (y1 , y2 ) April 21-25, 2008 · Beijing, China (11) The variables in the antecedent of this rule are connected in a non-tree-like way, so such a rule cannot b e expressed in OWL. If we, however, deal with arbitrarily connected structures, such as the one shown in Figure 1(a), non-tree-like antecedents are essential for drawing the correct inferences. Various decidable combinations of DLs and rules cannot b e used for schema modeling. For example, the DL-safe rules [15] are syntactically restricted such that they apply only to the explicitly named ob jects. Role-safe [12] and weakly safe [17] rules also imp ose restrictions that prevent the application of the rules to arbitrary elements of the domain, and similar restrictions are also employed by various nonmonotonic rule extensions of OWL [6, 17, 14]. While these are quite useful in query answering, they cannot b e used to derive new conclusions from the schema. The DL S ROI Q [11] and the OWL 1.1 extension of OWL DL extend OWL with complex role inclusions of the form R1 . . . Rn S , restricted appropriately to ensure decidability. Such axiom solve some of the problems; however, they still cannot axiomatize arbitrary structures such as the one in Figure 1(a) or express axioms such as (11). Semantically, G = (V , E , ) should b e understood as a "template" for a fragment of a model. Let I b e a model and A an atomic concept lab eling some graph vertex i V . If I contains an ob ject such that AI , then I must also contain an instance of G in which corresp onds to i. For example, if I contains an instance of the Heart concept, then I must contain a relational structure corresp onding to Figure 1(a) in which corresp onds to the top-most vertex. As discussed in Section 3, extending DLs with constructs that allow the description of arbitrarily connected structures of unb ounded size easily leads to undecidability. In practice, structured ob jects are usually modeled up to a certain level of granularity, which naturally determines this b ound. For example, a human b ody consists of a certain numb er of organs. These organs might b e decomp osed into smaller parts; however, each such decomp osition is b ounded, so the entire model of human anatomy requires a b ounded numb er of objects. Even though the numb er of required ob jects may b e large and difficult to determine by hand, the fact that the domain is b ounded is intrinsic to the modeling problem. The reasoning algorithm presented in Section 5 uses this b ound to ensure termination even on arbitrarily connected, nontree-like structures. We assume that the set of atomic roles is divided into a set of atomic tree roles NRt and a set of atomic graph roles NRg . A graph-extended DL know ledge base is a 4-tuple K = (T , G, P , A) where T is a DL TBox, G is a description graph, P is a set of rules, and A is an ABox. Furthermore, T is allowed to refer only to tree roles, G and P are allowed to refer only to the graph roles, and A is allowed to refer to b oth graph and tree roles. For example, let K = (T , G, P , A) b e a graph-extended DL knowledge base with the following comp onents. Let T contain the axioms (12)­(14). Intuitively, axiom (12) says that each p erson has a parent and a heart; axiom (13) ensures that the heart of each sufferer from aortic regurgitation is an instance of HasAR ; and axiom (14) says that, on each aortic valve suffering from aortic regurgitation, some p erson is p erforming a surgery on it. Person hasParent .Person hasHeart .Heart AR Sufferer hasHeart .HasAR AorticValve HasAR performsSurgeryOn - .Person (12) (13) (14) Let G corresp ond to Figure 1(a), let P contain the rule (15) that propagates the HasAR concept over the structural comp onents of the heart, and let A contain the assertions Person (a) and AR Sufferer (a). HasAR (x) hasStructuralComponent (x, y ) (15) HasAR (y ) The semantics of graph-extended KBs ensures that each model I of K is of the form shown in Figure 2. I consists of two distinct parts. The tree backbone consists of objects (shown as large squares) connected through tree roles (shown using thick lines), and it is constructed using the standard DL axioms in T . As discussed in Section (3), the numb er of p ersons is not naturally b ounded so, if we want a decidable formalism, we must employ standard DL restrictions. Apart from the tree backb one, I also contains arbitrarily connected but naturally b ounded graph instances, such as the structure of the heart of each p erson. Unlike 4. DESCRIPTION GRAPHS We now present an extension of OWL that addresses the problems from Section 3. 4.1 Basic Principles The main asp ect of a description of a structured ob ject is the connection b etween the ob ject's parts, which can naturally b e represented as a graph. Hence, we introduce the notion of a description graph G = (V , E , )--a directed graph in which each vertex i V is lab eled with a set of atomic concepts i and each edge i, j E is lab eled with a set of atomic roles i, j . For example, Figure 1(a) can b e understood as a description graph that describ es the heart. 558 WWW 2008 / Refereed Track: Semantic / Data Web - Semantic Web I April 21-25, 2008 · Beijing, China of integers cal led vertices, ( ii) E V × V is a set of edges, and ( iii) labels each vertex i V with a set of atomic concepts i NC , and each edge i, j E with a set of atomic graph roles i, j NRg . For an atomic concept A, VA is the set of vertices that contain A in their label: VA = {k V | A k }. We now define the notion of graph-extended DL knowledge bases. The definition of graph-regular rules ensures that each such rule can b ecome applicable only to ob jects from the same instance of the description graph G, which is required to obtain a decidable formalism. Definition 3. A rule of the form (1) is graph-regular if it uses only atomic concepts and graph roles and, for each pair of variables x1 and x2 occurring in r , some antecedent atom of r contains both x1 and x2 . A graph-extended DL knowledge base K = (T , G, P , A) is a 4-tuple where ( i) T is DL TBox over the signature (NC , NRt , NI ), ( ii) G is a description graph with vertices, ( iii) P is a finite set of graph-regular rules, and ( iv) A is an extensional ly-reduced ABox over (NC , NRt , NRg , NI ) that, apart from standard assertions, can also contain graph assertions of the form G(a1 , . . . , a ) for ai NI . Graph-regular rules can express conjunctive queries over G, so we do not consider query answering separately. We now formalize the semantics of description graphs. Definition 4. An interpretation I = (I , ·I ) interprets a description graph G = (V , E , ) with vertices as an ary relation GI (I ) . An interpretation I satisfies G, written I |= G, if al l of the fol lowing conditions hold. i-key property: for each 1 i , x1 , . . . , x , y1 , . . . , y I : x1 , . . . , x GI 1 x j = yj y1 , . . . , y GI xi = yi j Figure 2: A Typical Model of K in the case of axioms (2)­(4) and Figure 1(b), each graph instance is necessarily of the form as sp ecified by G in each such model I . Note that the tree backb one of I need not b e contiguous: the b ottom-most AorticValve ob ject av can b e connected to other ob jects through tree roles. To summarize, for a graph-extended knowledge base K, we can consider only models that consist of graph instances, connected among themselves and with other ob jects through tree roles. Decidability of the formalism is now ensured by the separation of the roles into tree and graph ones. The axioms in T can propagate constraints across tree roles just like in standard DLs; however, we can adapt the blocking technique [10] to ensure termination of model construction. Furthermore, the rules in P can propagate constraints within a graph; however, the size of the graph is naturally b ounded, so this does not cause termination problems either. Our way of obtaining decidability is related to fusions of abstract description systems (ADSs) [2], which allow for the combination of different modal and description logics. The comp onent ADSs can share concepts; however, the interaction b etween them through roles is restricted to ensure decidability. Our separation of roles into graph and tree ones is similar in spirit. Bounded structures and rules, however, cannot directly b e expressed as an ADS. In addition, we present a practical decision procedure. Disjointness property: x1 , . . . , x , y1 , . . . , y I : x1 , . . . , x GI 1 x i = yj y1 , . . . , y GI i 0. -rule If s t A and s = t then A1 := mergeA (s t) if t is a named individual or if s is a descendant of t; or A1 := mergeA (t s) otherwise. -rule -rule If 1. n R.C (s) A, 2. s is not blo cked in A, and 3. there are no individuals u1 , . . . , un such that {ar(R, s, ui ), C (ui ) | 1 i n} {ui uj | 1 i < j n} A, then A1 := A {ar(R, s, ti ), C (ti ) | 1 i n} {ti tj | 1 i < j n} where t1 , . . . , tn are fresh pairwise distinct tree successors of s. G-rule If 1. G|i (s) A for G a description graph with vertices, 2. s is not blo cked in A, and 3. there are no individuals u1 , . . . , ui-1 , ui+1 , . . . , u such that G(u1 , . . . , ui-1 , s, ui+1 , . . . , u ) A then A1 := A {G(t1 , . . . , ti-1 , s, ti+1 , . . . , t )} where t1 , . . . , ti-1 , ti+1 , . . . , t are fresh pairwise distinct graph individuals from the same graph cluster as s. If s s A or {A(s), ¬A(s)} A then A1 := A {}. Note: A is a generalized ABox, R is a set of admissible rules, and NA is the set of individuals o ccurring in A. Our translation cannot correctly handle axioms of the form A n R.B with n 2. Intuitively, such axioms might b e handled by creating n vertices in G, lab eling all of them with B , and then connecting the vertex of A with all the vertices of B using the role R. The situation, however, is not so simple if, in addition, we also have the axiom B m R.A. It is now not clear which vertices of the description graph lab eled with A to "reuse" to satisfy this axiom. Therefore, we decided to ignore such axioms. This is partly justified by the fact that GALEN and FMA--our main sources of inspiration and test data--do not contain n R.B concepts with n 2. In human anatomy, different ob jects of the domain are naturally given different names. For example, instead of an axiom Heart 2 hasStructuralComponent .SideOfHeart , (16) GALEN introduces explicit names for the left and the right side of the heart: Heart hasStructuralComponent .LeftSideOfHeart (17) Heart hasStructuralComponent .RightSideOfHeart (18) On ontologies with at-least restrictions, our algorithm simply treats each n R.B as R.B . It is natural to use numb er restrictions for modeling symmetric organs such as the kidney. On such an ontology, our algorithm produces a description graph containing just one copy of the ob ject, and the graph can then b e corrected by the modeler. Determining the sets NCg and NRg manually is not easy. According to our exp erience with GALEN and FMA, a good strategy is to manually identify a set of roles NRg that naturally b elong to the graph, and then to take NRg as the clo sure of NRg w.r.t. the explicit role inclusions from T1 . Then, we take NCg as the set of all concepts A and B occurring in an axiom A R.B T1 such that R NRg . Intuitively, if A and B are connected by a role that should b e included into the graph, then it is likely that A and B should b e included into the graph as well. This idea, however, requires some refinement. For example, GALEN contains the following axioms: LeftVentricle Ventricle RightVentricle Ventricle (19) (20) a description graph G containing three different vertices, each lab eled with one of these concepts. It is, however, counterintuitive for G to contain a Ventricle vertex: no ventricle as such exists on its own; rather, each concrete ventricle is either the left of the right ventricle. In fact, such a description graph G is unsatisfiable. Assume that an ob ject x as instance of LeftVentricle ; due to (19), x is also an instance of Ventricle . To satisfy the A-start prop erty for LeftVentricle , x must corresp ond to the i-th vertex of some instance of G; similarly, to satisfy the A-start prop erty for Ventricle , x must also corresp ond to the j -th vertex of some instance of G. Finally, b ecause LeftVentricle and Ventricle lab el different vertices of G, we have i = j , which then invalidates the disjointness prop erty of Definition 4. The concept Ventricle is thus an abstract concept : it is not meant to b e instantiated directly, but only through a sub class. Such concepts clearly do not b elong into a description graph. Hence, after computing NCg as describ ed in the previous paragraph, our algorithm classifies the input TBox T1 using standard DL reasoning; then, it removes from NCg all concepts that are not leaves in the resulting classification. Intuitively, if A is not a leaf concept in the classification of T1 , then A is likely to b e an abstract concept, so it should not b e added to G. A pseudo-code of the algorithm is given in [13], and the tool can b e downloaded from HermiT's Web site. 6.2 Transforming GALEN and FMA We applied the algorithm from Section 6.1 to the original version of GALEN; furthermore, FMA is a very large ontology, so we have applied our algorithm to a fragment of FMA that describ es the heart. Both ontologies can b e downloaded from HermiT's Web page. Table 2 summarizes information ab out the original and the transformed ontologies. Our transformation clearly leads to a change in the semantics of the ontology, and some information is lost in the process. Many parts of the resulting description graph, however, corresp ond with the intuitive descriptions of the anatomy of the b ody. For example, the graph shown in Figure 1(a) has b een extracted from the transformed ontology. 7. EVALUATION AND DISCUSSION To evaluate our approach, we have classified the original ontologies using HermiT, transformed them using the algorithm from Section 6 into graph-extended KBs, and classified the resulting KBs using the reasoning algorithm pre- Let us assume that NCg contains Ventricle , LeftVentricle , and RightVentricle . The core transformation then generates 562 WWW 2008 / Refereed Track: Semantic / Data Web - Semantic Web I Table 2: Information about Test Ontologies Total number of concepts: Total number of roles: Total number of GCIs: GCIs discarded in the transformation: With both a tree and a graph role: With existentials on abstract concepts: Translated GCIs: Into the description graph: Into rules over the graph: Into the DL TBox: With existentials on tree roles: With universals on tree roles: Involving concept names only: Vertices in the description graph: Edges in the description graph: GALEN 2748 413 6962 320 74 246 6642 680 155 5807 1741 952 3114 325 667 FMA 430 38 3479 328 0 328 3151 2966 1 184 16 0 168 342 1076 April 21-25, 2008 · Beijing, China sented in Section 5. We now present the p erformance results and discuss the classification results. 7.1 Performance Results We p erformed the exp eriments using a standard laptop with 1 GB of RAM. The classification of the original version of GALEN and the fragment of FMA took 129 s and 57 s, resp ectively; furthermore, the classification of the transformed ontologies took 781 s and 6 s, resp ectively. The increase in the classification time for GALEN is partly due to the fact that our implementation of the reasoning algorithm in Section 5 is still very prototypical. In the case of FMA, the classification times are substantially lower b ecause most of the original ontology is translated into the graph, so the generated models are much smaller. Our p erformance results show that, even with a very prototypical implementation, we can process complex ontologies, which we take as indication that our approach is practically feasible. clared functional. Since GALEN is underconstrained, this does not cause the inconsistency of either concept, so this error has not b een detected so far. The description graph produced by our transformation, however, contains one node for the patella and one for each retinaculum; furthermore, b oth retinacula are connected through isAtOtherEndOf to the same patella. Since isAtOtherEndOf is functional, the retinacula should b e the same, which invalidates the disjointness prop erty for the graph (see Definition 4) and makes Patel la unsatisfiable. In the case of FMA, we did not obtain any new subsumption relationships. This is due to the fact that most of the subsumption relationships in FMA are represented explicitly as axioms of the form A B where A and B are atomic concepts. For example, the fact that the heart is an organ is represented explicitly with the axiom Heart Organ , and it is not derived from the structure of the heart; clearly, such inferences are p erformed in the same way on b oth tree-like and nontree structures. As explained in Section 6, our algorithm discards some axioms from the ontology. We compared the class hierarchies of the original and the graph-extended versions of GALEN. In total, 361 subsumption relationships were lost, such as Femur BodySpace (the femur is a b ody space), and InteratrialSeptum TwoAndAHalfDimensionalStructure (the interatrial septum of the heart is a structure with two and a half dimensions). All these entailments involve an abstract concept, so their loss is not surprising since the transformation algorithm discards GCIs that involve an abstract concept and an existential on a graph role. No information ab out concrete concepts has b een lost, though. In contrast, in the case of FMA we did not lose any subsumption relationships. As explained b efore, the reason is that the structural information in FMA largely does not influence subsumption. 7.2 Changes in the Semantics The transformed ontologies are more constrained than the original ones, so we exp ect to obtain new entailments. In the case of GALEN, we discovered a concept that is satisfiable in the original version of the ontology, but is unsatisfiable in the transformed ontology, which revealed a modeling error in GALEN. The problem occurs in the representation of the patella--a b one that is connected to certain tendons through two retinacula, represented using the concepts LateralPatel laRetinaculum and MedialPatel laRetinaculum . GALEN describ es the relationship b etween the patella and the two retinacula as follows: LateralPatel laRetinaculum hasOtherEndAt .Patel la (. . .) MedialPatel laRetinaculum hasOtherEndAt .Patel la (. . .) hasOtherEndAt isAtOtherEndOf - 1 isAtOtherEndOf (21) (22) (23) (24) 7.3 Discussion Our exp erience with GALEN and the discussions we had with the authors of GALEN lead us to conclude that our formalism represents the anatomical structures in the human b ody in a way that is closer to the modelers' intention than the original OWL axioms.3 The fact that we found a modeling error in GALEN leads us to b elieve that our formalism and its semantics are based on "reasonable" assumptions. Furthermore, capturing the semantics of abstract concepts and axioms involving them prop erly is likely to b e the most imp ortant op en problem. We briefly discuss p ossibilities for addressing it. Consider the following axiom in GALEN that is eliminated by the transformation algorithm b ecause b oth AtrioventricularValve and Ventricle are abstract concepts: AtrioventricularValve hasAlphaConnection .Ventricle (25) According to the axioms ab ove, each patella is connected to b oth the lateral and the medial retinacula, but due to functionality of isAtOtherEndOf , the two must b e the same ob jects. Intuitively, this is an undesirable consequence, since the two retunaculae are in reality different ob jects; in other words, isAtOtherEndOf should probably not have b een de- Since b oth concepts in (25) are abstract, this axiom does not say anything ab out the structure of the concrete objects (i.e., the ob jects that are likely to b e included into a description graph). Thus, one might exp ect the actual relationship b etween valves and ventricles to b e describ ed for the concrete sub classes of AtrioventricularValve and Ventricle . Axiom (25) can then b e interpreted as a check which makes 3 Thanks to Alan Rector and Sebastian Brandt. 563 WWW 2008 / Refereed Track: Semantic / Data Web - Semantic Web I sure that this abstract relationship is concretized at a lower level. Another p ossibility is to interpret Ventricle disjunctively over its sub classes: each valve is connected to either left or the right ventricle, but we do not know which. Currently, it is not clear which interpretation is appropriate; in fact, the prop er interpretation of abstract concepts is made more difficult by the fact that whether a concept is abstract or concrete dep ends on the level of granularity. April 21-25, 2008 · Beijing, China [7] [8] 8. CONCLUSION [9] We have extended OWL with description graphs, which can b e used to describ e structured ob jects--ob jects consisting of parts connected in a complex, arbitrary way. We also allow for arbitrary SWRL-like rules over description graphs. Unlike most existing combinations of DLs and rules in which rules can b e used only for query answering [12, 15, 17, 6, 14], our rules also fully participate in schema reasoning. Based on an observation that many structured ob jects exhibit a natural b ound on their size, we derived a hyp ertableau reasoning algorithm for our formalism, which we implemented in the HermiT reasoner. To obtain suitable test data, we extracted description graphs out of GALEN and FMA medical terminologies. We successfully classified the resulting ontologies and even detected a modeling error. We see three op en problems for future research. First, graph-extended KBs should provide for several and not just one description graph, as this would allow breaking up a large graph into several more manageable parts. The main challenge is to identify an appropriate paradigm for sp ecifying relationships b etween different description graphs. Second, an adequate semantics for modeling abstract concepts at different levels of granularity is needed. Third, to allow for a wider users' community, we would need to extend ontology editors such as Prot´g´ with description graphs. ee [10] [11] [12] [13] [14] [15] [16] Acknowledgments We thank Alan Rector and Sebastian Brandt for providing us with comments from the domain exp erts' p ersp ective. [17] [18] 9. REFERENCES [19] [1] A. Artale, E. Franconi, N. Guarino, and L. Pazzi. Part-whole relations in ob ject-centered systems: An overview. Data Know ledge & Engineering, 20(3):347­383, 1996. [2] F. Baader, C. Lutz, H. Sturm, and F. Wolter. Fusions of Description Logics and Abstract Description Systems. Journal of Artificial Intel ligence Research, 16:1­58, 2002. [3] D. Calvanese, G. De Giacomo, and M. Lenzerini. Structured Ob jects: Modeling and Reasoning. In Proc. DOOD '95, pages 229­246, 1995. [4] S. Derriere, A. Richard, and A. Preite-Martinez. An Ontology of Astronomical Ob ject Typ es for the Virtual Observatory. In Proc. of the 26th meeting of the IAU, pages 17­18, 2006. [5] F. M. Donini, M. Lenzerini, D. Nardi, and A. Schaerf. AL-log: Integrating Datalog and Description Logics. Journal of Intel ligent Information Systems, 10(3):227­252, 1998. [6] T. Eiter, T. Lukasiewicz, R. Schindlauer, and H. Tompits. Combining Answer Set Programming [20] [21] [22] [23] [24] with Description Logics for the Semantic Web. In Proc. KR 2004, pages 141­151, 2004. F. W. Hartel, S. de Coronado, R. Dionne, G. Fragoso, and J. Golb eck. Modeling a description logic vocabulary for cancer research. Journal of Biomedical Informatics, 38(2):114­129, 2005. I. Horrocks and P. F. Patel-Schneider. A Prop osal for an OWL Rules Language. In Proc. WWW 2004, pages 723­731, 2004. I. Horrocks and U. Sattler. A Tableaux Decision Procedure for S HOI Q. In Proc. IJCAI 2005, pages 448­453, 2005. I. Horrocks, U. Sattler, and S. Tobies. Practical Reasoning for Very Expressive Description Logics. Logic Journal of the IGPL, 8(3):239­263, 2000. O. Kutz, I. Horrocks, and U. Sattler. The Even More Irresistible SROIQ. In Proc. KR 2006, pages 68­78, 2006. A. Y. Levy and M.-C. Rousset. Combining Horn Rules and Description Logics in CARIN. Artificial Intel ligence, 104(1­2):165­209, 1998. B. Motik, B. Cuenca Grau, and U. Sattler. Representation of and Reasoning with Structured Ob jects in OWL. Technical rep ort, University of Oxford, UK, 2007. See first author's Web page. B. Motik and R. Rosati. A Faithful Integration of Description Logics with Logic Programming. In Proc. IJCAI 2007, pages 477­482, 2007. B. Motik, U. Sattler, and R. Studer. Query Answering for OWL-DL with Rules. Journal of Web Semantics, 3(1):41­60, 2005. B. Motik, R. Shearer, and I. Horrocks. Optimized Reasoning in Description Logics using Hyp ertableaux. In Proc. CADE-21, pages 67­83, 2007. R. Rosati. DL + log : A Tight Integration of Description Logics and Disjunctive Datalog. In Proc. KR 2006, pages 68­78, 2006. C. Rosse and J. V. L. Mejino. A reference ontology for biomedical informatics: the Foundational Model of Anatomy. Journal of Biomedical Informatics, 36:478­500, 2003. J. Seidenb erg and A. L. Rector. Representing Transitive Propagation in OWL. In Proc. ER 2006, pages 255­266, 2006. A. Sidhu, T. Dillon, E. Chang, and B. Singh Sidhu. Protein Ontology Development using OWL. In Proc. OWLED 2005, 2005. D. Soergel, B. Lauser, A. Liang, F. Fisseha, J. Keizer, and S. Katz. Reengineering Thesauri for New Applications: The AGROVOC Example. Journal of Digital Information, 4(4), 2004. W.D. Solomon, A. Rob erts, J. E. Rogers, C. J. Wroe C.J., and A. L. Rector. Having our cake and eating it too: How the GALEN Intermediate Representation reconciles . . .. In Proc. AMIA, pages 819­823, 2000. K. A Spackman. SNOMED RT and SNOMEDCT. Promise of an international clinical terminology. M.D. Computing, 17(6):29, 2000. M. Y. Vardi. Why Is Modal Logic So Robustly Decidable? In Proc. DIMACS Workshop, volume 31, pages 149­184, 1996. 564