TY - CONF T1 - Materializing multi-relational databases from the web using taxonomic queries T2 - Proceedings of the fourth ACM international conference on Web search and data mining Y1 - 2011 A1 - Michelson,Matthew A1 - Macskassy,Sofus A. A1 - Minton,Steven N. A1 - Getoor, Lise KW - discovering multi-relational data KW - multirelational data AB - Recently, much attention has been given to extracting tables from Web data. In this problem, the column definitions and tuples (such as what "company" is headquartered in what "city,") are extracted from Web text, structured Web data such as lists, or results of querying the deep Web, creating the table of interest. In this paper, we examine the problem of extracting and discovering multiple tables in a given domain, generating a truly multi-relational database as output. Beyond discovering the relations that define single tables, our approach discovers and leverages "within column" set membership relations, and discovers relations across the extracted tables (e.g., joins). By leveraging within-column relations our method can extract table instances that are ambiguous or rare, and by discovering joins, our method generates truly multi-relational output. Further, our approach uses taxonomic queries to bootstrap the extraction, rather than the more traditional "seed instances." Creating seeds often requires more domain knowledge than taxonomic queries, and previous work has shown that extraction methods may be sensitive to which input seeds they are given. We test our approach on two real world domains: NBA basketball and cancer information. Our results demonstrate that our approach generates databases of relevant tables from disparate Web information, and discovers the relations between them. Further, we show that by leveraging the "within column" relation our approach can identify a significant number of relevant tuples that would be difficult to do so otherwise. JA - Proceedings of the fourth ACM international conference on Web search and data mining T3 - WSDM '11 PB - ACM CY - New York, NY, USA SN - 978-1-4503-0493-1 UR - http://doi.acm.org/10.1145/1935826.1935885 M3 - 10.1145/1935826.1935885 ER -