Probabilistic models of text and link structure for hypertext classification

TitleProbabilistic models of text and link structure for hypertext classification
Publication TypeJournal Articles
Year of Publication2001
AuthorsGetoor L, Segal E, Taskar B, Koller D
JournalIJCAI workshop on text learning: beyond supervision
Pagination24 - 29
Date Published2001///
Abstract

Most text classification methods treat each document as anindependent instance. However, in many text domains, doc-
uments are linked and the topics of linked documents are cor-
related. For example, web pages of related topics are often
connected by hyperlinks and scientific papers from related
fields are commonly linked by citations.
We propose a
unified probabilistic model for both the textual content and
the link structure of a document collection. Our model is
based on the recently introduced framework of Probabilistic
Relational Models (PRMs), which allows us to capture cor-
relations between linked documents. We show how to learn
these models from data and use them efficiently for classifi-
cation. Since exact methods for classification in these large
models are intractable, we utilize belief propagation, an ap-
proximate inference algorithm. Belief propagation automat-
ically induces a very natural behavior, where our knowledge
about one document helps us classify related ones, which in
turn help us classify others. We present preliminary empiri-
cal results on a dataset of university web pages.