[HOME] [EDUCATION] [PUBLICATIONS] [PATENTS] [RESEARCH] [SOFTWARE] [INTERNSHIPS] [COURSES] [PROJECTS] [PASTIMES


SCALABLE MACHINE LEARING FOR MASSIVE DATASETS: FAST SUMMATION ALGORITHMS

[ The art of getting 'good enough' solutions 'as fast as possible' ]

Huge data sets containing millions of training examples with a large number of attributes (tall fat data) are relatively easy to gather. However one of the bottlenecks for successful inference of useful information from the data is the computational complexity of machine learning algorithms. Most state-of-the-art nonparametric machine learning algorithms have a computational complexity of either O(N^2) or O(N^3), where N is the number of training examples. This has seriously restricted the use of massive data sets. The bottleneck computational primitive at the heart of various algorithms is the multiplication of a structured matrix with a vector, which we refer to as matrix-vector product (MVP) primitive. The goal of my thesis is to speedup up these MVP primitives by fast approximate algorithms that scale as O(N) and also provide high accuracy guarantees. I use ideas from computational physics, scientific computing, and computational geometry to design these algorithms. Currently the proposed algorithms have been applied in kernel density estimation, optimal bandwidth estimation, projection pursuit, Gaussian process regression, implicit surface fitting, and ranking.

[ Research Summary ] [ Thesis ] [ Slides ]

PUBLICATIONS

AISTATS 2007 A fast algorithm for learning large scale preference relations. Vikas C. Raykar, Ramani Duraiswami, and Balaji Krishnapuram, In Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, San Juan, Peurto Rico, March 2007, pp. 385-392.  [paper] [slides] [code] [bib] [ More details can be found in CS-TR-4848 ] [oral presentation]

LKSM 2006 The Improved Fast Gauss Transform with applications to machine learning.  Vikas C. Raykar and Ramani Duraiswami, To appear in Large Scale Kernel Machines  L. Bottou, O. Chapelle, D. Decoste, and J. Weston (Eds), MIT Press 2006. [chapter] [ Detailed version available as CS-TR-4767 ]

SDM 2006 Fast optimal bandwidth selection for kernel density estimation. Vikas C. Raykar and Ramani Duraiswami, In Proceedings of the sixth SIAM International Conference on Data Mining, Bethesda, April 2006, pp. 524-528. [paper] [brief slides] [code] [bib] [ Detailed version available as CS-TR-4774 ]

WORKSHOP PRESENTATIONS

Learning 2007 Fast large scale Gaussian process regression using approximate matrix-vector productsVikas C. Raykar and Ramani Duraiswami,  Presented at the Learning workshop 2007, San Juan, Peurto Rico, March 2007. [abstract] [detailed paper] [slides]

NIPS 2005 The improved fast Gauss Transform with applications to machine learning.  Vikas C. Raykar and Ramani Duraiswami, Presented at the NIPS 2005 workshop on Large scale kernel machines. [slides] [video] [code]

TECHNICAL REPORTS

Fast weighted summation of erfc functions. Vikas C. Raykar, R. Duraiswami, and B. Krishnapuram, CS-TR-4848, Department of computer science, University of Maryland, CollegePark.  [abstract] [TR] [slides] [code] [bib]

Very fast optimal bandwidth selection for univariate kernel density estimation.  Vikas C. Raykar and R. Duraiswami, CS-TR-4774, Department of computer science, University of Maryland, CollegePark. [abstract] [TR] [slides] [code] [bib]

Fast computation of sums of Gaussians in high dimensions.  Vikas C. Raykar, C. Yang, R. Duraiswami, and N. Gumerov, CS-TR-4767, Department of computer science, University of Maryland, CollegePark. [abstract] [TR] [slides] [code] [bib]

UNPUBLISHED REPORTS

Scalable machine learning for massive datasets: Fast summation algorithms  Vikas C. Raykar A summary of the thesis contributions. [summary] [two page summary]

Computational tractability of machine learning algorithms for tall fat data. The prelimnary oral examination for the degree of Ph.D in computer science,  University of Maryland, CollegePark, May 4, 2006. [proposal] [slides] [reading list]

The fast Gauss transform with all the proofs. [ pdf ]

Correction to Lemma 2.2 of the fast Gauss transform. [ pdf

A short primer on the fast multipole method. [ pdf ]

SOFTWARE REALESES

Fast summation of erfc functions and ranking [code]

Fast optimal bandwidth selection for kernel density estimation. [code] [slides]

The improved fast Gauss Transform. [code]


The copyrights of publications are with the respective publishers. The papers are being reproduced here for timely dissemination of scholarly information.

09/30/2007

-