Web Archiving: Organizing Web Objects into Web Containers to Optimize Access

TitleWeb Archiving: Organizing Web Objects into Web Containers to Optimize Access
Publication TypeReports
Year of Publication2007
AuthorsSong S, JaJa JF
Date Published2007/10/09/
InstitutionInstititue for Advanced Computer Studies, Univ of Maryland, College Park
KeywordsTechnical Report
Abstract

The web is becoming the preferred medium for communicating and storinginformation pertaining to almost any human activity. However it is an
ephemeral medium whose contents are constantly changing, resulting in
a permanent loss of part of our cultural and scientific heritage on a
regular basis. Archiving important web contents is a very challenging
technical problem due to its tremendous scale and complex structure,
extremely dynamic nature, and its rich heterogeneous and deep
contents. In this paper, we consider the problem of archiving a linked
set of web objects into web containers in such a way as to minimize
the number of containers accessed during a typical browsing session.
We develop a method that makes use of the notion of PageRank and
optimized graph partitioning to enable faster browsing of archived web
contents. We include simulation results that illustrate the
performance of our scheme and compare it to the common scheme
currently used to organize web objects into web containers.

URLhttp://drum.lib.umd.edu/handle/1903/7426