Scalable algorithms for machine learning and data mining

[HOME] [EDUCATION] [PUBLICATIONS] [RESEARCH] [SOFTWARE] [INTERNSHIPS] [PROJECTS]

SCALABLE MACHINE LEARING FOR MASSIVE DATASETS: FAST SUMMATION ALGORITHMS

[ The art of getting 'good enough' solutions 'as fast as possible' ]

Huge data sets containing millions of training examples with a large number of attributes (tall fat data) are relatively easy to gather. However one of the bottlenecks for successful inference of useful information from the data is the computational complexity of machine learning algorithms. Most state-of-the-art nonparametric machine learning algorithms have a computational complexity of either O(N^2) or O(N^3), where N is the number of training examples. This has seriously restricted the use of massive data sets. The bottleneck computational primitive at the heart of various algorithms is the multiplication of a structured matrix with a vector, which we refer to as matrix-vector product (MVP) primitive. The goal of my thesis is to speedup up these MVP primitives by fast approximate algorithms that scale as O(N) and also provide high accuracy guarantees. I use ideas from computational physics, scientific computing, and computational geometry to design these algorithms. Currently the proposed algorithms have been applied in kernel density estimation, optimal bandwidth estimation, projection pursuit, Gaussian process regression, implicit surface fitting, and ranking.

[ Research Summary ] [ Thesis ] [ Slides ]

PUBLICATIONS

- Fast kernel density estimation -

Fast Computation of Kernel Estimators Vikas C. Raykar, Ramani Duraiswami, and Linda H. Zhao, Journal of Computational and Graphical Statistics. March 2010, Vol. 19, No. 1: 205-220 [abstract] [paper] [bib]

Fast optimal bandwidth selection for kernel density estimation. Vikas C. Raykar and Ramani Duraiswami, In Proceedings of the sixth SIAM International Conference on Data Mining, Bethesda, April 2006, pp. 524-528. [paper] [brief slides] [code] [bib] [ Detailed version available as CS-TR-4774 ]

Very fast optimal bandwidth selection for univariate kernel density estimation. Vikas C. Raykar and R. Duraiswami, CS-TR-4774, Department of computer science, University of Maryland, CollegePark. [abstract] [TR] [slides] [code] [bib]

- Improved Fast Gauss Transform -

Automatic online tuning for fast Gaussian summation Vlad I. Morariu, Balaji V. Srinivasan, Vikas C. Raykar, Ramani Duraiswami, and Larry Davis, In Advances in Neural Information Processing Systems (NIPS 2008), vol. 21, pp.1113-1120, 2009. [paper] [spotlight slide] [bib] [code]

The Improved Fast Gauss Transform with applications to machine learning Vikas C. Raykar, and Ramani Duraiswami, In Large Scale Kernel Machines L. Bottou, O. Chapelle, D. Decoste, and J. Weston (Eds), MIT Press 2006. [chapter]

The improved fast Gauss Transform with applications to machine learning. Vikas C. Raykar and Ramani Duraiswami, Presented at the NIPS 2005 workshop on Large scale kernel machines. [slides] [code] [video]

Fast computation of sums of Gaussians in high dimensions. Vikas C. Raykar, C. Yang, R. Duraiswami, and N. Gumerov, CS-TR-4767, Department of computer science, University of Maryland, Collegepark. [abstract] [TR] [slides] [code] [bib]

Fast large scale Gaussian process regression using approximate matrix-vector products. Vikas C. Raykar and Ramani Duraiswami, Presented at the Learning workshop 2007, San Juan, Peurto Rico, March 2007. [abstract] [detailed paper] [slides]

Efficient Kriging via Fast Matrix-Vector Products Nargess Memarsadeghi, Vikas C. Raykar, Ramani Duraiswami, and David M. Mount. In IEEE Aerospace Conference, Big Sky, Montana, March 2008. [paper]

- Fast ranking algorithms -

A fast algorithm for learning a ranking function from large scale data sets Vikas C. Raykar, Ramani Duraiswami, and Balaji Krishnapuram, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 7, pp. 1158-1170, July 2008. [paper]

A fast algorithm for learning large scale preference relations. Vikas C. Raykar, Ramani Duraiswami, and Balaji Krishnapuram, In Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, San Juan, Peurto Rico, March 2007, pp. 385-392. [paper] [slides] [code] [bib] [ More details can be found in CS-TR-4848 ] [oral presentation]

Fast weighted summation of erfc functions. Vikas C. Raykar, R. Duraiswami, and B. Krishnapuram, CS-TR-4848, Department of computer science, University of Maryland, CollegePark. [abstract] [TR] [slides] [code] [bib]

- Unpublished Reports-

Scalable machine learning for massive datasets: Fast summation algorithms Vikas C. Raykar A summary of the thesis contributions. [summary] [two page summary]

Computational tractability of machine learning algorithms for tall fat data. The prelimnary oral examination for the degree of Ph.D in computer science, University of Maryland, CollegePark, May 4, 2006. [proposal] [slides] [reading list]

The fast Gauss transform with all the proofs. [ pdf ]

Correction to Lemma 2.2 of the fast Gauss transform. [ pdf ]

A short primer on the fast multipole method. [ pdf ]

- Software -

Fast summation of erfc functions and ranking [code]

Fast optimal bandwidth selection for kernel density estimation. [code] [slides]

The improved fast Gauss Transform. [code]

The copyrights of publications are with the respective publishers. The papers are being reproduced here for timely dissemination of scholarly information.

09/30/2007