Network Clustering Approximation Algorithm Using One Pass Black Box Sampling

TitleNetwork Clustering Approximation Algorithm Using One Pass Black Box Sampling
Publication TypeJournal Articles
Year of Publication2011
AuthorsDuBois T, Golbeck J, Srinivasan A
JournalarXiv:1110.3563
Date Published2011/10/16/
KeywordsComputer Science - Social and Information Networks, Physics - Physics and Society
Abstract

Finding a good clustering of vertices in a network, where vertices in the same cluster are more tightly connected than those in different clusters, is a useful, important, and well-studied task. Many clustering algorithms scale well, however they are not designed to operate upon internet-scale networks with billions of nodes or more. We study one of the fastest and most memory efficient algorithms possible - clustering based on the connected components in a random edge-induced subgraph. When defining the cost of a clustering to be its distance from such a random clustering, we show that this surprisingly simple algorithm gives a solution that is within an expected factor of two or three of optimal with either of two natural distance functions. In fact, this approximation guarantee works for any problem where there is a probability distribution on clusterings. We then examine the behavior of this algorithm in the context of social network trust inference.

URLhttp://arxiv.org/abs/1110.3563