Cloud Oriented Subgraph Identification
As social networks such as Facebook, Twitter, Myspace, LinkedIn, LiveJournal and others get increasingly larger, efficient indexes need to be developed in order to support very basic social network operations such as graph matching. We assume a social network consists of edge-labeled nodes and that a query can be expressed as a graph pattern (with some variables denoting nodes with unknown labels). Subgraph matching deals with the problem of finding all instances of the query graph in the social network. Subgraph matching is well known to be NP-hard, but unfortunately, this does not make the problem of subgraph matching go away.
COSI leverages compute clouds and is not only the first algorithm to perform subgraph matching using clouds, but is also the first algorithm to perform subgraph matching of complex query subgraphs against a social network database consisting of 778M edges in under a second. We have subsequently extended these results to social network databases with over a billion edges.
When using subgraph matching techniques in a cloud computing environment, one of the fundamental problems is that of graph partitioning, that is, distributing the social network across the available compute nodes. Graph partitioning is another NP-hard problem and we developed efficient algorithms that can load, effectively partition, and store a social network with 778 million edges in approximately 10 hours. Our experiments showed that the algorithm produces good partitions and improves the query time for complex subgraph matching queries.
In addition to social networks, COSI can also be used to execute graph matching queries expressed in the Semantic Web language SPARQL against a database in the W3C “Resource Description Framework” (RDF) database paradigm.
For additional information, please contact Dr. V.S. Subrahmanian.
Last updated: June 2010 by John Dickerson
This page will be updated as our work enters print. For information about receiving draft publications, technical reports, and conference presentations, please do not hesitate to contact team members.
The following sections may include links to restricted access material. Please do not hesitate to contact a group member for instructions regarding how to obtain a username and password.
For access to any software, please contact Dr. V.S. Subrahmanian.