
Arvind Agarwal
Department of Computer Science, University of Maryland
A. V. Williams Building (Rm 3126)
College Park, MD, 20742
Email: arvinda [AT] cs.umd.edu
News!!!
Defended my thesis successfully, July 2012.
Paper on "Improving Watson using Rank Aggregation" accepted in CIKM 2012.
Internship at IBM T. J. Watson, Research Center, NY, Summer 2011.
Invited talk at International Joint Conference on Artificial Intelligence, 2011.
Best Student paper award in machine learning track at ECML, 2010.
I am currently a PhD student (Fall 2008 - present) at Department of Computer Science in University of Maryland, College Park. My advisor is Prof. Hal Daume III. I completed my M.S. in Computer Science in Spring 2008 from School of Computing, University of Utah. I did my undergraduate (Bachelors of Engineering) from Birla Institute of Technology & Science (BITS), Pilani, India.
My research interests are in Machine Learning in particular, in problems related to social computing, recommendation systems, data mining/visualization with an emphasis on large scale algorithms, and matrix and graph algorithms. I am also interested in the problems/techniques that have a geometric flavor in it e.g., kernel methods, distance/metric based methods, graph-based methods etc. My interests also extend to the fields of computational geometry and information geometry.
I spent summers of 2007 and 2011, interning at IBM, T. J. Watson Research Center, NY. During my internships, I had a chance to work on Watson - IBM's famous question-answering system.Publications: C = Conference, J = Journal, W = Workshop, P = In Preparation.
Selected:
Learning to Rank for Robust Question Answering [pdf][abstract] C
This paper aims to solve the problem of improving the ranking of answer candidates for factoid based questions in a state-of-the-art Question Answering system. We first provide an extensive comparison of 5 ranking algorithms on two datasets - from the Jeopardy quiz show and a medical domain. We then show the effectiveness of a cascading approach, where the ranking produced by one ranker is used as input to the next stage. The cascading approach shows sizeable gains on both datasets. We finally evaluate several rank aggregation techniques to combine these algorithms, and find that Supervised Kemeny aggregation is a robust technique that always beats the baseline ranking approach used by Watson for the Jeopardy competition. We further corroborate our results on TREC Question Answering datasets.
Generative Kernels for Exponential Families [pdf][abstract] C
In this paper, we propose a family of kernels for the data distributions belonging to the exponential family. We call these kernels {\it generative kernels} because they take into account the generative process of the data. Our proposed method considers the geometry of the data distribution to build a set of efficient closed-form kernels best suited for that distribution. We compare our generative kernels on multinomial data and observe improved empirical performance across the board. Moreover, our generative kernels perform significantly better when training size is small, an important property of the generative models.
A Unified Algorithmic Framework for Multi-Dimensional Scaling [pdf][Code][abstract] C
In this paper, we propose a unified algorithmic framework for solving many known variants of MDS. Our algorithm is a simple iterative scheme with guaranteed convergence, and is modular; by changing the internals of a single subroutine in the algorithm, we can switch cost unctions and target spaces easily. In addition to the formal guarantees of convergence, our algorithms are accurate; in most cases, they converge to better quality solutions than existing methods, in comparable time. We expect that this framework will be useful for a number of MDS variants that have not yet been studied.
Exponential Family Hybrid Semi-Supervised Learning [pdf][abstract] C
We present an approach to semi-supervised learning based on an exponential family characterization. Our approach generalizes previous work on coupled priors for hybrid generative/discriminative models. Our model is more flexible and natural than previous approaches. Experimental results on several data sets show that our approach also performs better in practice.
Complete List: (in chronological order)
Adaptive Sampling for Large Scale MDS [pdf] C P
Arvind Agarwal and Chad Brubaker and Hal Daume III and Jeff M. Phillips and Suresh Venkatasubramanian; Under review.Large Scale Robust Sensor Network Localization P
Arvind Agarwal and Hal Daume III and Jeff M. Phillips and Suresh Venkatasubramanian; Under review.Learning to Rank for Robust Question Answering [pdf][abstract] C
This paper aims to solve the problem of improving the ranking of answer candidates for factoid based questions in a state-of-the-art Question Answering system. We first provide an extensive comparison of 5 ranking algorithms on two datasets - from the Jeopardy quiz show and a medical domain. We then show the effectiveness of a cascading approach, where the ranking produced by one ranker is used as input to the next stage. The cascading approach shows sizeable gains on both datasets. We finally evaluate several rank aggregation techniques to combine these algorithms, and find that Supervised Kemeny aggregation is a robust technique that always beats the baseline ranking approach used by Watson for the Jeopardy competition. We further corroborate our results on TREC Question Answering datasets.
Generative Kernels for Exponential Families [pdf][abstract] C
In this paper, we propose a family of kernels for the data distributions belonging to the exponential family. We call these kernels {\it generative kernels} because they take into account the generative process of the data. Our proposed method considers the geometry of the data distribution to build a set of efficient closed-form kernels best suited for that distribution. We compare our generative kernels on multinomial data and observe improved empirical performance across the board. Moreover, our generative kernels perform significantly better when training size is small, an important property of the generative models.
Learning Multiple Tasks using Manifold Regularization [pdf][abstract] C
We present a novel method for multitask learning (MTL) based on manifold regularization. We assume that all task parameters lie on a manifold which is the generalization of the assumption made in the existing literature i.e., task parameters share a common linear subspace. The proposed method uses the projection distance from the manifold to regularize the task parameters. The manifold structure and the task parameters are learned using an alternating optimization framework. When the manifold structure is fixed, our method decomposes into learning independent tasks, making it appealing for learning new tasks. An approximation of the manifold regularization scheme is presented that preserves the convexity of the single task learning problem, and makes the proposed MTL framework efficient and easy to implement. We show the efficacy of our method on several datasets.
A Geometric View of Conjugate Priors [pdf][abstract][arXiv] J (Best Student Paper)
In Bayesian machine learning, conjugate priors are popular, mostly due to mathematical convenience. In this paper, we show that there are deeper reasons for choosing a conjugate prior. Specifically, we formulate the conjugate prior in the form of Bregman divergence and show that it is the inherent geometry of conjugate priors that makes them appropriate and intuitive. This geometric interpretation allows one to view the hyperparameters of conjugate priors as the effective sample points, thus providing additional intuition. We use this geometric understanding of conjugate priors to derive the hyperparameters and expression of the prior used to couple the generative and discriminative components of a hybrid model for semi-supervised learning.
A Unified Algorithmic Framework for Multi-Dimensional Scaling [pdf][Code][abstract] C
In this paper, we propose a unified algorithmic framework for solving many known variants of MDS. Our algorithm is a simple iterative scheme with guaranteed convergence, and is modular; by changing the internals of a single subroutine in the algorithm, we can switch cost unctions and target spaces easily. In addition to the formal guarantees of convergence, our algorithms are accurate; in most cases, they converge to better quality solutions than existing methods, in comparable time. We expect that this framework will be useful for a number of MDS variants that have not yet been studied.
Multitask Learning using Transformation Functions[pdf] W
Arvind Agarwal and Hal Daume III; The Learning Workshop 2010, Snowbird, Utah.Incremental Multi-Dimensional Scaling[pdf] W
Arvind Agarwal and Jeff M. Phillips and Hal Daume III and Suresh Venkatasubramanian; The Learning Workshop 2010, Snowbird, Utah.Exponential Family Hybrid Semi-Supervised Learning [pdf][abstract] C
We present an approach to semi-supervised learning based on an exponential family characterization. Our approach generalizes previous work on coupled priors for hybrid generative/discriminative models. Our model is more flexible and natural than previous approaches. Experimental results on several data sets show that our approach also performs better in practice.
Research Positions:
Summer Internship: IBM T J Watson, NY, Summer 2011 (Worked on QA system Watson!)
Graduate Research Assistant: Department of Computer Science, University of Maryland, MD; Fall-10, Spring-11, Fall-12 - Present
Graduate Research Assistant: School of computing, University of Utah, Salt Lake City UT, Summer-06, Spring-07, Fall-07-Summer-10
Summer Internship : IBM T J Watson NY, Summer 2007 (Worked on QA system Watson!)
Professional Activities:
Primary Reviewer: NIPS-11, ICML-12
Secondary Reviewer: NIPS-09,10; IJCAI-09, AISTATS-09, ICML-08,09,10, UAI-09
Course Work:
Courses: Machine Learning, Artificial Intelligence, Computational Geometry, Advanced Algorithms, Natural Language Processing, Applications of NLP, Randomized Algorithms, Database Management System.
Seminars: Graphical Models, Manifold Learning, Clustering, Geometry of Information Spaces, Kernel Methods, Optimization