A large portion of real-world data is stored in commercial relational database systems. In contrast, most statistical learning methods work only with "flat" data representations. Thus, to apply these methods, we are forced to convert the data into a flat form, thereby losing much of the relational structure present in the data and potentially introducing statistical skew. These drawbacks severely limit the ability of current methods to mine relational databases.
In this talk I will review recent work on probabilistic models, including Bayesian networks (BNs) and Probabilistic Relational Models (PRMs), and then describe the development of techniques for automatically inducing PRMs directly from structured data stored in a relational or object-oriented database. These algorithms provide the necessary tools to discover patterns in structured data, and provide new techniques for mining relational data. As we go along, I'll present experimental results in several domains, including a biological domain describing tuberculosis epidemiology, a database of scientific paper author and citation information, and Web data. Finally I will present an application of these techniques to the task of selectivity estimation for database query optimization.
Joint work with Nir Friedman, Daphne Koller, Avi Pfeffer and Benjamin Taskar.
About the speaker: Lise Getoor joined the University of Maryland, College Park as an assistant professor this December. She earned her PhD from Stanford University. The title of her dissertation is 'Learning Statistical Models from Relational Data'. Her research interests include learning probabilistic models, data mining, constraint optimization and problem (re)formulation. She has published papers on a variety of topics including learning probabilistic models, utility elicitation, on-line scheduling, constraint-based planning and machine learning. Before coming to Stanford, she worked at NASA-Ames Research Center as a research associate. She received her M.S. in Computer Science from UC Berkeley in 1989 and her B.S. in Computer Science from UC Santa Barbara in 1986. She is the recipient of a National Physical Sciences Consortium fellowship and member of ACM, AAAI and Tau Beta Pi.
For the colloquium series schedule, see the UMD Computational Linguistics Colloquium Series web page at http://umiacs.umd.edu/~resnik/cl_colloquium/. If you are interested in meeting with the speaker, please contact Philip Resnik (resnik@umiacs.umd.edu).