Abstract | The World Wide Web is a huge, growing repository of information on a wide range of topics. It is alsobecoming important, commercially and sociologically, as a place of human interaction within different
communities.
In this paper we present an experimental study of the structure of the Web. We analyze link topologies
of various communities, and patterns of mirroring of content, on 1997 and 1999 snapshots of the Web.
Our results give insight into patterns of interaction within communities and how they evolve, as well as
patterns of data replication.
We also describe the techniques we have developed for performing complex processing on this large
data set, and our experiences in doing so. We present new algorithms for finding partial and complete
mirrors in URL hierarchies; these are also of independent interest for search and redirection. In order to
study and visualize link topologies of different communities, we have developed techniques to compact
these large link graphs without much information loss.
|