Hal Daumé III

[By Date] [By Type] [By Topic] [Co-Authors & Stats]



2012 | 2011 | 2010 | 2009 | 2008 | 2007 | 2006 | 2005 | 2004 | 2002 | 2001


2012

Regularized Interlingual Projections: Evaluation on Multilingual Transliteration
Jagadeesh Jagarlamudi, Hal Daumé III
2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2012@InProceedings{daume12transliterate,
   author = {Jagadeesh Jagarlamudi and Hal {Daum\'e III}},
   title = {Regularized Interlingual Projections: Evaluation on Multilingual Transliteration},
   booktitle = {Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning},
   month = {July},
   year = {2012},
   address = {Jeju Island, Korea},
   publisher = {Association for Computational Linguistics},
   pages = {12--23},
   url = {http://www.aclweb.org/anthology/D12-1002}
}


Imitation Learning by CoachingAbstract     Imitation Learning has been shown to be successful in solving many challenging real-world problems. Some recent approaches give strong performance guarantees by training the policy iteratively. However, it is important to note that these guarantees depend on how well the policy we found can imitate the oracle on the training data. When there is a substantial difference between the oracle’s ability and the learner’s policy space, we may fail to find a policy that has low error on the training set. In such cases, we propose to use a coach that demonstrates easy-to-learn actions for the learner and gradually approaches the oracle. By a reduction of learning by demonstration to online learning, we prove that coaching can yield a lower regret bound than using the oracle. We apply our algorithm to cost-sensitive dynamic feature selection, a hard decision problem that considers a user-specified accuracy-cost trade-off. Experimental results on UCI datasets show that our method outperforms state-of-the-art imitation learning methods in dynamic feature selection and two static feature selection methods.
He He, Hal Daumé III, Jason Eisner
Neural Information Processing Systems (NIPS), 2012@InProceedings{daume12coaching,
   author = {He He and Hal {Daum\'e III} and Jason Eisner},
   title = {Imitation Learning by Coaching},
   booktitle = {Neural Information Processing Systems (NIPS)},
   year = {2012},
   url = {http://hal3.name/docs/#daume12coaching}
}
   


Detecting Visual TextAbstract     When people describe a scene, they often include information that is not visually apparent; sometimes based on background knowledge, sometimes to tell a story. We aim to separate visual text—descriptions of what is being seen—from non-visual text in natural images and their descriptions. To do so, we first concretely define what it means to be visual, annotate visual text and then develop algorithms to automatically classify noun phrases as visual or non-visual. We find that using text alone, we are able to achieve high accuracies at this task, and that incorporating features derived from computer vision algorithms improves performance. Finally, we show that we can reliably mine visual nouns and adjectives from large corpora and that we can use these effectively in the classification task.    [data/code]
Jesse Dodge, Amit Goyal, Xufeng Han, Alyssa Mensch, Margaret Mitchell, Karl Stratos, Kota Yamaguchi, Yejin Choi, Hal Daumé III, Alexander C. Berg, Tamara L. Berg
North American Chapter of the Association for Computational Linguistics (NAACL), 2012@InProceedings{daume12desctext,
   author = {Jesse Dodge and Amit Goyal and Xufeng Han and Alyssa Mensch and Margaret Mitchell and Karl Stratos and Kota Yamaguchi and Yejin Choi and Hal {Daum\'e III} and Alexander C. Berg and Tamara L. Berg},
   title = {Detecting Visual Text},
   booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)},
   year = {2012},
   url = {http://hal3.name/docs/#daume12desctext}
}


Understanding and Predicting Importance in ImagesAbstract     What do people care about in an image? To drive computational visual recognition toward more human-centric outputs, we need a better understanding of how people perceive and judge the importance of content in images. In this paper, we explore how a number of factors relate to human perception of importance. Proposed factors fall into 3 broad types: 1) factors related to composition, e.g. size, location, 2) factors related to semantics, e.g. category of object or scene, and 3) contextual factors related to the likelihood of attribute-object, or object-scene pairs. We explore these factors using what people describe as a proxy for importance. Finally, we build models to predict what will be described about an image given either known image content, or image content estimated automatically by recognition systems.
Karl Stratos, Aneesh Sood, Alyssa Mensch, Xufeng Han, Margaret Mitchell, Kota Yamaguchi, Jesse Dodge, Amit Goyal, Hal Daumé III, Alexander C. Berg, Tamara L. Berg
Computer Vision and Pattern Recognition (CVPR), 2012@InProceedings{daume12importance,
   author = {Karl Stratos and Aneesh Sood and Alyssa Mensch and Xufeng Han and Margaret Mitchell and Kota Yamaguchi and Jesse Dodge and Amit Goyal and Hal {Daum\'e III} and Alexander C. Berg and Tamara L. Berg},
   title = {Understanding and Predicting Importance in Images},
   booktitle = {Computer Vision and Pattern Recognition (CVPR)},
   year = {2012},
   url = {http://hal3.name/docs/#daume12importance}
}


Midge: Generating Image Descriptions From Computer Vision DetectionsAbstract     This paper introduces a novel generation system that composes humanlike descriptions of images from computer vision detections. By leveraging syntactically informed word co-occurrence statistics, the generator filters and constrains the noisy detections output from a vision system to generate syntactic trees that detail what the computer vision system sees. Results show that the generation system outperforms state-of-the-art systems, automatically generating some of the most natural image descriptions to date.
Margaret Mitchell, Jesse Dodge, Amit Goyal, Kota Yamaguchi, Karl Stratos, Xufeng Han, Alyssa Mensch, Alexander C. Berg, Tamara L. Berg, Hal Daumé III
European Chapter of the Association for Computational Linguistics (EACL), 2012@InProceedings{daume12midge,
   author = {Margaret Mitchell and Jesse Dodge and Amit Goyal and Kota Yamaguchi and Karl Stratos and Xufeng Han and Alyssa Mensch and Alexander C. Berg and Tamara L. Berg and Hal {Daum\'e III}},
   title = {Midge: Generating Image Descriptions From Computer Vision Detections},
   booktitle = {European Chapter of the Association for Computational Linguistics (EACL)},
   year = {2012},
   url = {http://hal3.name/docs/#daume12midge}
}


Sketch Algorithms for Estimating Point Queries in NLP
Amit Goyal, Hal Daumé III, Graham Cormode
Empirical Methods in Natural Language Processing (EMNLP), 2012@InProceedings{daume12pointquery,
   author = {Amit Goyal and Hal {Daum\'e III} and Graham Cormode},
   title = {Sketch Algorithms for Estimating Point Queries in {NLP}},
   booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
   year = {2012},
   url = {http://hal3.name/docs/#daume12pointquery}
  }


Learning Task Grouping and Overlap in Multi-task LearningAbstract     In the paradigm of multi-task learning, multiple related prediction tasks are learned jointly, sharing information across the tasks. We propose a framework for multi-task learning that enables one to selectively share the information across the tasks. We assume that each task parameter vector is a linear combination of a finite number of underlying basis tasks. The coefficients of the linear combination are sparse in nature and the overlap in the sparsity patterns of two tasks controls the amount of sharing across these. Our model is based on the assumption that task parameters within a group lie in a low dimensional subspace but allows the tasks in different groups to overlap with each other in one or more bases. Experimental results on four datasets show that our approach outperforms competing methods.
Abhishek Kumar, Hal Daumé III
International Conference on Machine Learning (ICML), 2012@InProceedings{daume12gomtl,
   author = {Abhishek Kumar and Hal {Daum\'e III}},
   title = {Learning Task Grouping and Overlap in Multi-task Learning},
   booktitle = {International Conference on Machine Learning (ICML)},
   year = {2012},
   url = {http://hal3.name/docs/#daume12gomtl}
}


A Binary Classification Framework for Two-Stage Multiple Kernel LearningAbstract     With the advent of kernel methods, automating the task of specifying a suitable kernel has become increasingly important. In this context, the Multiple Kernel Learning (MKL) problem of finding a combination of prespecified base kernels that is suitable for the task at hand has received significant attention from researchers. In this paper we show that Multiple Kernel Learning can be framed as a standard binary classification problem with additional constraints that ensure the positive definiteness of the learned kernel. Framing MKL in this way has the distinct advantage that it makes it easy to leverage the extensive research in binary classification to develop better performing and more scalable MKL algorithms that are conceptually simpler, and, arguably, more accessible to practitioners. Experiments on nine data sets from different domains show that, despite its simplicity, the proposed technique compares favorably with current leading MKL approaches.
Abhishek Kumar, Alexandru Niculescu-Mizil, Koray Kavukcuoglu, Hal Daumé III
International Conference on Machine Learning (ICML), 2012@InProceedings{daume12binarymkl,
   author = {Abhishek Kumar and Alexandru Niculescu-Mizil and Koray Kavukcuoglu and Hal {Daum\'e III}},
   title = {A Binary Classification Framework for Two-Stage Multiple Kernel Learning},
   booktitle = {International Conference on Machine Learning (ICML)},
   year = {2012},
   url = {http://hal3.name/docs/#daume12binarymkl}
}


Generalized Multiview Analysis: A Discriminative latent spaceAbstract     This paper presents a general multi-view feature extraction approach that we call Generalized Multiview Analysis or GMA. GMA has all the desirable properties required for cross-view classification and retrieval: it is supervised, it allows generalization to unseen classes, it is multi-view and kernelizable, it affords an efficient eigenvalue based solution and is applicable to any domain. GMA exploits the fact that most popular supervised and unsupervised feature extraction techniques are the solution of a special form of a quadratic constrained quadratic program (QCQP), which can be solved efficiently as a generalized eigenvalue problem. GMA solves a joint, relaxed QCQP over different feature spaces to obtain a single (non)linear subspace. Intuitively, GMA is a supervised extension of Canonical Correlational Analysis (CCA), which is useful for cross-view classification and retrieval. The proposed approach is general and has the potential to replace CCA whenever classification or retrieval is the purpose and label information is available. We outperform previous approaches for text-image retrieval on Pascal and Wiki text-image data. We report state-of-the-art results for pose and lighting invariant face recognition on the MultiPIE face dataset, significantly outperforming other approaches.
Abhishek Sharma, Abhishek Kumar, Hal Daumé III, David Jacobs
Computer Vision and Pattern Recognition (CVPR), 2012@InProceedings{daume12gma,
   author = {Abhishek Sharma and Abhishek Kumar and Hal {Daum\'e III} and David Jacobs},
   title = {Generalized Multiview Analysis: A Discriminative latent space},
   booktitle = {Computer Vision and Pattern Recognition (CVPR)},
   year = {2012},
   url = {http://hal3.name/docs/#daume12gma}
}


Incorporating Lexical Priors into Topic ModelsAbstract     Topic models have great potential for helping users understand document corpora. This potential is stymied by their purely unsupervised nature, which often leads to topics that are neither entirely meaningful nor effective in extrinsic tasks (Chang et al., 2009). We propose a simple and effective way to guide topic models to learn topics of specific interest to a user. We achieve this by providing sets of seed words that a user believes are representative of the underlying topics in a corpus. Our model uses these seeds to improve both topicword distributions (by biasing topics to produce appropriate seed words) and to improve document-topic distributions (by biasing documents to select topics related to the seed words they contain). Extrinsic evaluation on a document clustering task reveals a significant improvement when using seed information, even over other models that use seed information navely.
Jagadeesh Jagarlamudi, Hal Daumé III, Raghavendra Udupa
Conference on European Chapter of the Association for Computational Linguistics (EACL), 2012@inproceedings{daume12seeded,
   title = {Incorporating Lexical Priors into Topic Models},
   author = {Jagadeesh Jagarlamudi and Hal {Daum\'e III} and Raghavendra Udupa},
   booktitle = {Proceedings of the Conference on European Chapter of the Association for Computational Linguistics (EACL)},
   year = {2012},
   address = {Avignon, France},
   url = {http://hal3.name/docs/#daume12seeded}
}


Cost-sensitive Dynamic Feature SelectionAbstract     We present an instance-specific dynamic feature selection algorithm at test time, which sequentially chooses features given values of already selected features and stops to make a prediction according to a user-specified speed-accuracy trade-off. We apply imitation learning techniques to address the problem of learning and inference jointly in a simple multiclass classification setting. Our feature selection method treats the given solver (e.g. a classifier trained with a full set of features) as a black box and does not have any constraint on it. Experimental results show that using a dynamic instance-specific feature set can significantly improve accuracy at a low cost.
He He, Hal Daumé III, Jason Eisner
ICML 2012 Workshop on Interactions between Inference and Learning (Inferning), 2012@inproceedings{daume12dynafea,
   title = {Cost-sensitive Dynamic Feature Selection},
   author = {He He and Hal {Daum\'e III} and Jason Eisner},
   booktitle = {ICML 2012 Workshop on Interactions between Inference and Learning (Inferning)},
   year = {2012},
   address = {Edinburgh, Scotland},
   url = {http://hal3.name/docs/#daume12dynafeat}
}


Fast Large-Scale Approximate Graph Construction for NLP
Amit Goyal, Hal Daumé III, Raul Guerra
Empirical Methods in Natural Language Processing (EMNLP), 2012@InProceedings{daume12flag,
   author = {Amit Goyal and Hal {Daum\'e III} and Raul Guerra},
   title = {Fast Large-Scale Approximate Graph Construction for {NLP}},
   booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
   year = {2012},
   url = {http://hal3.name/docs/#daume12flag}
}


Flexible Modeling of Latent Task Structures in Multitask LearningAbstract     Multitask learning algorithms are typically designed assuming some fixed, a priori known latent structure shared by all the tasks. However, it is usually unclear what type of latent task structure is the most appropriate for a given multitask learning problem. Ideally, the "right" latent task structure should be learned in a data-driven manner. We present a flexible, nonparametric Bayesian model that posits a mixture of factor analyzers structure on the tasks. The nonparametric aspect makes the model expressive enough to subsume many existing models of latent task structures (e.g, meanregularized tasks, clustered tasks, low-rank or linear/non-linear subspace assumption on tasks, etc.). Moreover, it can also learn more general task structures, addressing the shortcomings of such models. We present a variational inference algorithm for our model. Experimental results on synthetic and realworld datasets, on both regression and classification problems, demonstrate the effectiveness of the proposed method.
Alexandre Passos, Piyush Rai, Jacques Wainer, Hal Daumé III
International Conference on Machine Learning (ICML), 2012@InProceedings{daume12flexiblemtl,
   author = {Alexandre Passos and Piyush Rai and Jacques Wainer and Hal {Daum\'e III}},
   title = {Flexible Modeling of Latent Task Structures in Multitask Learning},
   booktitle = {International Conference on Machine Learning (ICML)},
   year = {2012},
   address = {Edinburgh, Scotland},
   url = {http://hal3.name/docs/#daume12flexiblemtl}
}


Low-dimensional Discriminative RerankingAbstract     The accuracy of many natural language processing tasks can be improved by a reranking step, which involves selecting a single output from a list of candidate outputs generated by a baseline system. We propose a novel family of reranking algorithms based on learning separate low-dimensional embeddings of the task’s input and output spaces. This embedding is learned in such a way that prediction becomes a low-dimensional nearest-neighbor search, which can be done computationally efficiently. A key quality of our approach is that feature engineering can be done separately on the input and output spaces; the relationship between inputs and outputs is learned automatically. Experiments on part-of-speech tagging task in four languages show significant improvements over a baseline decoder and existing reranking approaches.
Jagadeesh Jagarlamudi, Hal Daumé III
Conference on North American Chapter of the Association for Computational Linguistics, 2012@inproceedings{daume12lowdim,
   title = {Low-dimensional Discriminative Reranking},
   author = {Jagadeesh Jagarlamudi and Hal {Daum\'e III}},
   booktitle = {Proceedings of the Conference on North American Chapter of the Association for Computational Linguistics},
   year = {2012},
   address = {Montreal, Canada},
   url = {http://hal3.name/docs/#daume12lowdim}
}


Protocols for Learning Classifiers on Distributed DataAbstract     We consider the problem of learning classifiers for labeled data distributed across several nodes. The goal is to find a single classifier across all datasets with small approximation error, where the quantity to be minimized is communication between nodes. This setting models real-world communication bottlenecks in the processing of massive distributed datasets. We present several very general sampling-based solutions as well as some two-way protocols which have a provable exponential speed-up over any one-way protocol. We focus on core problems for noiseless data distributed across two or more nodes. The techniques we introduce are reminiscent of active learning, but rather than actively probing labels, nodes actively communicate with each other, each node simultaneously learning the important data from the other node.
Hal Daumé III, Jeff Phillips, Avishek Saha, Suresh Venkatasubramanian
International Conference on Artificial Intelligence and Statistics (AIStats), 2012@inproceedings{daume12distributed,
   title = {Protocols for Learning Classifiers on Distributed Data},
   author = {Hal {Daum\'e III} and Jeff Phillips and Avishek Saha and Suresh Venkatasubramanian},
   booktitle = {Proceedings of the International Conference on Artificial Intelligence and Statistics (AIStats)},
   year = {2012},
   address = {Canary Islands},
   url = {http://hal3.name/docs/#daume12distributed}
}


2011

A Computational Model for Plot Units
Amit Goyal, Ellen Riloff, Hal Daumé III
Computational Intelligence Journal, 2011@article{daume11plotunits,
   author = {Amit Goyal and Ellen Riloff and Hal {Daum\'e III}},
   title = {A Computational Model for Plot Units},
   journal = {Computational Intelligence Journal},
   year = {2011},
   url = {http://hal3.name/docs/#daume11plotunits}
}


Speed-Accuracy Tradeoffs in Nondeterministic Inference AlgorithmsAbstract     Statistical learning has led to great advances in building models that achieve high accuracy. However, test-time inference in these models can be slow, for example in structured prediction problems. This is frequently addressed by using test-time heuristics to guide and prune the search for a good structured output. In this high-level paper, we ask: Could we explicitly train such heuristics to trade off accuracy and efficiency? And how does this relate to existing learning problems?
Jason Eisner, Hal Daumé III
COST: NIPS 2011 Workshop on Computational Trade-offs in Statistical Learning, 2011@InProceedings{daume11tradeoffs,
   author = {Jason Eisner and Hal {Daum\'e III}},
   title = {Speed-Accuracy Tradeoffs in Nondeterministic Inference Algorithms},
   booktitle = {Proceedings of COST: NIPS 2011 Workshop on Computational Trade-offs in Statistical Learning},
   year = {2011},
   address = {Sierra Nevada, Spain},
   url = {http://hal3.name/docs/#daume11tradeoffs}
}


From Bilingual Dictionaries to Interlingual Document RepresentationsAbstract     Mapping documents into an interlingual representation can help bridge the language barrier of a cross-lingual corpus. Previous approaches use aligned documents as training data to learn an interlingual representation, making them sensitive to the domain of the training data. In this paper, we learn an interlingual representation in an unsupervised manner using only a bilingual dictionary. We first use the bilingual dictionary to find candidate document alignments and then use them to find an interlingual representation. Since the candidate alignments are noisy, we develop a robust learning algorithm to learn the interlingual representation. We show that bilingual dictionaries generalize to different domains better: our approach gives better performance than either a word by word translation method or Canonical Correlation Analysis (CCA) trained on a different domain.
Jagadeesh Jagarlamudi, Hal Daumé III, Raghavendra Udupa
Association for Computational Linguistics (ACL), 2011@InProceedings{daume11interlingual,
   author = {Jagadeesh Jagarlamudi and Hal {Daum\'e III} and Raghavendra Udupa},
   title = {From Bilingual Dictionaries to Interlingual Document Representations},
   booktitle = {Association for Computational Linguistics (ACL)},
   year = {2011},
   address = {Portland, OR},
   url = {http://hal3.name/docs/#daume11interlingual}
}


Multiple Hash Functions for LearningAbstract     In this paper, we explore the idea of feature-hashing in learning problems. We first evaluate some hashing strategies on the basis of their efficacy on classification problems. We then explore the following trade-off: Given a fixed budget (say K) for the hashed feature vector, should one use a single hash function that gives a hashed vector of size K, or use multiple hash functions to come up with smaller representations (say 3 hash functions, each giving a representation of size K=3)? In particular, for the latter setting, how should the different hashed representations be combined? We propose online learning algorithms for this setting using multiple Perceptrons (one for each hashed representation), and explore a number of Perceptron update and prediction schemes. Experimental results demonstrate that our update schemes give better classification accuracies than the case when a single hashed feature vector is used to train the model.
Amit Goyal, Piyush Rai, Hal Daumé III
NIPS Big Learning Workshop, 2011@InProceedings{daume11multihash,
   author = {Amit Goyal and Piyush Rai and Hal {Daum\'e III}},
   title = {Multiple Hash Functions for Learning},
   booktitle = {NIPS Big Learning Workshop},
   year = {2011},
   address = {Sierra Nevada, Spain},
   url = {http://hal3.name/docs/#daume11multihash}
}


Computational methods are invaluable for typology, but the models must match the questions: Commentary on Dunn et al. (2011)
Roger Levy, Hal Daumé III
Unpublished, 2011@Misc{daume11dunn,
   author = {Roger Levy and Hal {Daum\'e III}},
   title = {Computational methods are invaluable for typology, but the models must match the questions: Commentary on Dunn et al. (2011)},
   howpublished = {Journal of Linguistic Typology},
   year = {2011},
   url = {http://hal3.name/docs/#daume11dunn}
}


Message-Passing for Approximate MAP Inference with Latent Variables
Jiarong Jiang, Piyush Rai, Hal Daumé III
Conference on Neural Information Processing Systems (NIPS), 2011@InProceedings{daume11mapmarg,
   author = {Jiarong Jiang and Piyush Rai and Hal {Daum\'e III}},
   title = {Message-Passing for Approximate MAP Inference with Latent Variables},
   booktitle = {Proceedings of the Conference on Neural Information Processing Systems (NIPS)},
   year = {2011},
   address = {Granada, Spain},
   url = {http://hal3.name/docs/#daume11mapmarg}
}


Co-regularized Multi-view Spectral Clustering
Abhishek Kumar, Piyush Rai, Hal Daumé III
Conference on Neural Information Processing Systems (NIPS), 2011@InProceedings{daume11spectral,
   author = {Abhishek Kumar and Piyush Rai and Hal {Daum\'e III}},
   title = {Co-regularized Multi-view Spectral Clustering},
   booktitle = {Proceedings of the Conference on Neural Information Processing Systems (NIPS)},
   year = {2011},
   address = {Granada, Spain},
   url = {http://hal3.name/docs/#daume11spectral}
}


Active Supervised Domain AdaptationAbstract     In this paper, we harness the synergy between two important learning paradigms, namely, active learning and domain adaptation. We show how active learning in a target domain can leverage information from a different but related source domain. Our proposed framework, Active Learning Domain Adapted (ALDA), uses source domain knowledge to transfer information that facilitates active learning in the target domain. We propose two variants of ALDA: a batch B-ALDA and an online O-ALDA. Empirical comparisons with numerous baselines on real-world datasets establish the efficacy of the proposed methods.
Avishek Saha, Piyush Rai, Hal Daumé III, Suresh Venkatasubramanian, Scott L. DuVall
European Conference on Machine Learning (ECML), 2011@InProceedings{daume11alda,
   author = {Avishek Saha and Piyush Rai and Hal {Daum\'e III} and Suresh Venkatasubramanian and Scott L. DuVall},
   title = {Active Supervised Domain Adaptation},
   booktitle = {European Conference on Machine Learning (ECML)},
   year = {2011},
   address = {Athens, Greece},
   tags = {ml da},
   url = {http://hal3.name/docs/#daume11alda},
}
   


Corpus-Guided Sentence Generation of Natural ImagesAbstract     We propose a sentence generation strategy that describes images by predicting the most likely nouns, verbs, scenes and prepositions that make up the core sentence structure. The input are initial noisy estimates of the objects and scenes detected in the image using state of the art trained detectors. As predicting actions from still images directly is unreliable, we use a language model trained from the English Gigaword corpus to obtain their estimates; together with probabilities of co-located nouns, scenes and prepositions. We use these estimates as parameters on a HMM that models the sentence generation process, with hidden nodes as sentence components and image detections as the emissions. Experimental results show that our strategy of combining vision and language produces readable and descriptive sentences compared to naive strategies that use vision alone.
Yezhou Yang, Ching Lik Teo, Hal Daumé III, Yiannis Aloimonos
Empirical Methods in Natural Language Processing (EMNLP), 2011@InProceedings{daume11generation,
   author = {Yezhou Yang and Ching Lik Teo and Hal {Daum\'e III} and Yiannis Aloimonos},
   title = {Corpus-Guided Sentence Generation of Natural Images},
   booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
   year = {2011},
   address = {Edinburgh, Scotland},
}


Beam Search based MAP Estimates for the Indian Buffet Process
Piyush Rai, Hal Daumé III
International Conference on Machine Learning (ICML), 2011@InProceedings{daume11ibpsearch,
   author = {Piyush Rai and Hal {Daum\'e III}},
   title = {Beam Search based MAP Estimates for the Indian Buffet Process},
   booktitle = {International Conference on Machine Learning (ICML)},
   year = {2011},
   address = {Bellevue, WA},
   url = {http://hal3.name/docs/#daume11ibpsearch}
}


A Co-training Approach for Multiview Spectral Clustering
Abhishek Kumar, Hal Daumé III
International Conference on Machine Learning (ICML), 2011@InProceedings{daume11cospec,
   author = {Abhishek Kumar and Hal {Daum\'e III}},
   title = {A Co-training Approach for Multiview Spectral Clustering},
   booktitle = {International Conference on Machine Learning (ICML)},
   year = {2011},
   address = {Bellevue, WA},
   url = {http://hal3.name/docs/#daume11cospec}
}


Improving Bilingual Projections via Sparse Covariance Matrices
Jagadeesh Jagarlamudi, Raghavendra Udupa, Hal Daumé III, Abhijit Bhole
Empirical Methods in Natural Language Processing (EMNLP), 2011@InProceedings{daume11sparse,
   author = {Jagadeesh Jagarlamudi and Raghavendra Udupa and Hal {Daum\'e III} and Abhijit Bhole},
   title = {Improving Bilingual Projections via Sparse Covariance Matrices},
   booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
   year = {2011},
   address = {Edinburgh, Scotland},
}


Approximate Scalable Bounded Space Sketch for Large Data NLP
Amit Goyal, Hal Daumé III
Empirical Methods in Natural Language Processing (EMNLP), 2011@InProceedings{daume11sketch,
   author = {Amit Goyal and Hal {Daum\'e III}},
   title = {Approximate Scalable Bounded Space Sketch for Large Data {NLP}},
   booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
   year = {2011},
   address = {Edinburgh, Scotland},
   url = {http://hal3.name/docs/#daume11sketch}
}


Lossy Conservative Update (LCU) sketch: Succinct approximate count storage
Amit Goyal, Hal Daumé III
Conference on Artificial Intelligence (AAAI), 2011@InProceedings{daume11lcu,
   author = {Amit Goyal and Hal {Daum\'e III}},
   title = {Lossy Conservative Update ({LCU}) sketch: Succinct approximate count storage},
   booktitle = {Conference on Artificial Intelligence (AAAI)},
   year = {2011},
   address = {Portland, OR},
   url = {http://hal3.name/docs/#daume11lcu}
}


Online Learning of Multiple Tasks and Their RelationshipsAbstract     We propose an Online MultiTask Learning (OMTL) framework which simultaneously learns the task weight vectors as well as the task relatedness adaptively from the data. Our work is in contrast with prior work on online multitask learning which assumes fixed task relatedness, a priori. Furthermore, whereas prior work in such settings assume only positively correlated tasks, our framework can capture negative correlations as well. Our proposed framework learns the task relationship matrix by framing the objective function as a Bregman divergence minimization problem for positive definite matrices. Subsequently, we exploit this adaptively learned task-relationship matrix to select the most informative samples in an online multitask active learning setting. Experimental results on a number of real-world datasets and comparisons with numerous baselines establish the efficacy of our proposed approach.
Avishek Saha, Piyush Rai, Hal Daumé III, Suresh Venkatasubramanian
Conference on Artificial Intelligence and Statistics (AI-Stats), 2011@InProceedings{daume11olmt,
   author = {Avishek Saha and Piyush Rai and Hal {Daum\'e III} and Suresh Venkatasubramanian},
   title = {Online Learning of Multiple Tasks and Their Relationships},
   booktitle = {Conference on Artificial Intelligence and Statistics (AI-Stats)},
   year = {2011},
   address = {Ft. Lauderdale, FL},
   url = {http://hal3.name/docs/#daume11olmt}
}


Domain Adaptation for Machine Translation by Mining Unseen WordsAbstract     We show that unseen words account for a large part of the translation error when moving to new domains. Using an extension of a recent approach to mining translations from comparable corpora (Haghighi et al., 2008), we are able to find translations for otherwise OOV terms. We show several approaches to integrating such translations into a phrasebased translation system, yielding consistent improvements in translations quality (between 0.5 and 1.5 Bleu points) on four domains and two language pairs.
Hal Daumé III, Jagadeesh Jagarlamudi
Association for Computational Linguistics, 2011@InProceedings{daume11lexicaladapt,
   author = {Hal {Daum\'e III} and Jagadeesh Jagarlamudi},
   title = {Domain Adaptation for Machine Translation by Mining Unseen Words},
   booktitle = {Association for Computational Linguistics},
   year = {2011},
   address = {Portland, OR},
   url = {http://hal3.name/docs/#daume11lexicaladapt}
}


Generative Kernels for Exponential FamiliesAbstract     In this paper, we propose a family of kernels for the data distributions belonging to the exponential family. We call these kernels generative kernels because they take into account the generative process of the data. Our proposed method considers the geometry of the data distribution to build a set of efficient closed-form kernels best suited for that distribution. We compare our generative kernels on multinomial data and observe improved empirical performance across the board. Moreover, our generative kernels perform signicantly better when training size is small, an important property of the generative models.
Arvind Agarwal, Hal Daumé III
Conference on Artificial Intelligence and Statistics (AI-Stats), 2011@InProceedings{daume11genkern,
   author = {Arvind Agarwal and Hal {Daum\'e III}},
   title = {Generative Kernels for Exponential Families},
   booktitle = {Conference on Artificial Intelligence and Statistics (AI-Stats)},
   year = {2011},
   address = {Ft. Lauderdale, FL},
   url = {http://hal3.name/docs/#daume11genkern}
}


Leveraging Social Bookmarks from Partially Tagged Corpus for Improved Webpage Clustering
Anusua Trivedi, Piyush Rai, Hal Daumé III, Scott L. DuVall
ACM Transactions on Intelligent Systems and Technology, 2011@InProceedings{daume11social,
   author = {Anusua Trivedi and Piyush Rai and Hal {Daum\'e III} and Scott L. DuVall},
   title = {Leveraging Social Bookmarks from Partially Tagged Corpus for Improved Webpage Clustering},
   booktitle = {ACM Transactions on Intelligent Systems and Technology},
   year = {2011},
   url = {http://hal3.name/docs/#daume11social}
}


Generating Semantic Orientation Lexicon using Large Data and Thesaurus
Amit Goyal, Hal Daumé III
ACL Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA), 2011@InProceedings{daume11wassa,
   author = {Amit Goyal and Hal {Daum\'e III}},
   title = {Generating Semantic Orientation Lexicon using Large Data and Thesaurus},
   booktitle = {Proceedings of ACL Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA)},
   year = {2011},
   address = {Portland, OR},
   url = {http://hal3.name/docs/#daume11wassa}
}


2010

Multiview Clustering with Incomplete ViewsAbstract     Multiview clustering algorithms allow leveraging information frommultiple views of the data and therefore lead to improved clustering. A number of kernel based multiview clustering algorithms work by using the kernel matrices defined on the different views of the data. However, these algorithms assume availability of features from all the views of each example, i.e., assume that the kernel matrix for each view is complete. We present an approach that allows these algorithms to be applicable even when only one (the primary) view is complete and the auxiliary views are incomplete (i.e., features from these views are available only for some of the examples). Taking the kernel CCA based multiview clustering as an example, we apply our method on webpage clustering with multiple views of the data where one view is the page-text and other view is the social tags assigned to the webpage. We consider the case when the tags are available only for a small subset of the webpages which means that the tag view is incomplete. Experimental results establish the effectiveness of the proposed method.
Piyush Rai, Anusua Trivedi, Hal Daumé III, Scott L. DuVall
NIPS Workshop on Machine Learning for Social Computing, 2010@InProceedings{daume10mvincomplete,
   author = {Piyush Rai and Anusua Trivedi and Hal {Daum\'e III} and Scott L. DuVall},
   title = {Multiview Clustering with Incomplete Views},
   booktitle = {NIPS Workshop on Machine Learning for Social Computing},
   year = {2010},
   address = {Whistler, Canada},
   url = {http://hal3.name/docs/#daume10mvincomplete}
}


Multitask Learning via Mixture of Linear SubspacesAbstract     We propose a probabilistic generative model for multitask learning that exploits the cluster structure of the task parameters, and additionally imposes a low-rank constraint on the set of task parameters within each cluster. This leads to a sharing of statistical strengths of multiple tasks at two levels: (1) via cluster assumption, and (2) via a subspace assumption within each cluster. Our work brings in the benefits of both these aspects of task relationship, each of which has been addressed only individually in prior work. We assume a mixture of linear subspaces model on the latent task parameters that can capture both these aspects simultaneously. Furthermore, the mixture of subspaces assumption can model the fact that the task parameters could potentially live on a non-linear manifold instead of a linear subspace which is a restriction of earlier work on multitask learning based on the linear subspace assumption.
Piyush Rai, Hal Daumé III
NIPS Workshop on Transfer Learning by Learning Rich Generative Models, 2010@InProceedings{daume10mtlmls,
   author = {Piyush Rai and Hal {Daum\'e III}},
   title = {Multitask Learning via Mixture of Linear Subspaces},
   booktitle = {NIPS Workshop on Transfer Learning by Learning Rich Generative Models},
   year = {2010},
   address = {Whistler, Canada},
   url = {http://hal3.name/docs/#daume10mtlmls}
}


Co-regularized Spectral Clustering with Multiple KernelsAbstract     We propose a co-regularization based multiview spectral clustering algorithm which enforces the clusterings across multiple views to agree with each-other. Since each view can be used to define a similarity graph over the data, our algorithm can also be considered as learning with multiple similarity graphs, or equivalently with multiple kernels. We propose an objective function that implicitly combines two (or more) kernels, and leads to an improved clustering performance. Experimental comparisons with a number of baselines on several datasets establish the efficacy of our proposed approach.
Abhishek Kumar, Piyush Rai, Hal Daumé III
NIPS Workshop on New Directions in Multiple Kernel Learning, 2010@InProceedings{daume10spectral,
   author = {Abhishek Kumar and Piyush Rai and Hal {Daum\'e III}},
   title = {Co-regularized Spectral Clustering with Multiple Kernels},
   booktitle = {NIPS Workshop on New Directions in Multiple Kernel Learning},
   year = {2010},
   address = {Whistler, Canada},
   url = {http://hal3.name/docs/#daume10spectral}
}


A Co-regularization Based Semi-supervised Domain AdaptationAbstract     This paper presents a co-regularization based approach to semi-supervised domain adaptation. Our proposed approach (EA++) builds on the notion of augmented space (introduced in EASYADAPT (EA) [1]) and harnesses unlabeled data in target domain to further enable the transfer of information from source to target. This semi-supervised approach to domain adaptation is extremely simple to implement and can be applied as a pre-processing step to any supervised learner. Our theoretical analysis (in terms of Rademacher complexity) of EA and EA++ show that the hypothesis class of EA++ has lower complexity (compared to EA) and hence results in tighter generalization bounds. Experimental results on sentiment analysis tasks reinforce our theoretical findings and demonstrate the efficacy of the proposed method when compared to EA as well as a few other baseline approaches.
Abhishek Kumar, Avishek Saha, Hal Daumé III
Conference on Neural Information Processing Systems (NIPS), 2010@InProceedings{daume10coreg,
   author = {Abhishek Kumar and Avishek Saha and Hal {Daum\'e III}},
   title = {A Co-regularization Based Semi-supervised Domain Adaptation},
   booktitle = {Proceedings of the Conference on Neural Information Processing Systems (NIPS)},
   year = {2010},
   address = {Vancouver, Canada},
   url = {http://hal3.name/docs/#daume10coreg}
}


Learning Multiple Tasks using Manifold RegularizationAbstract     We present a novel method for multitask learning (MTL) based on manifold regularization. We assume that all task parameters lie on a manifold which is the generalization of the assumption made in the existing literature i.e., task parameters share a common linear subspace. The proposed method uses the projection distance from the manifold to regularize the task parameters. The manifold structure and the task parameters are learned using an alternating optimization framework. When the manifold structure is fixed, our method decomposes into learning independent tasks, making it appealing for learning new tasks. An approximation of the manifold regularization scheme is presented that preserves the convexity of the single task learning problem, and makes the proposed MTL framework efficient and easy to implement. We show the efficacy of our method on several datasets.
Arvind Agarwal, Samuel Gerber, Hal Daumé III
Conference on Neural Information Processing Systems (NIPS), 2010@InProceedings{daume10manifold,
   author = {Arvind Agarwal and Samuel Gerber and Hal {Daum\'e III}},
   title = {Learning Multiple Tasks using Manifold Regularization},
   booktitle = {Proceedings of the Conference on Neural Information Processing Systems (NIPS)},
   year = {2010},
   address = {Vancouver, Canada},
   url = {http://hal3.name/docs/#daume10manifold}
}


A geometric view of conjugate priorsAbstract     In Bayesian machine learning, conjugate priors are popular, mostly due to mathematical convenience. In this paper, we show that there are deeper reasons for choosing a conjugate prior. Specifically, we formulate the conjugate prior in the form of Bregman divergence and show that it is the inherent geometry of conjugate priors that makes them appropriate and intuitive. This geometric interpretation allows one to view the hyperparameters of conjugate priors as the effective sample points, thus providing additional intuition. We use this geometric understanding of conjugate priors to derive the hyperparameters and expression of the prior used to couple the generative and discriminative components of a hybrid model for semi-supervised learning.
Arvind Agarwal, Hal Daumé III
Machine Learning Journal (MLJ), 2010@article{daume10conjugate,
   author = {Arvind Agarwal and Hal {Daum\'e III}},
   title = {A geometric view of conjugate priors},
   year = {2010},
   booktitle = {Machine Learning Journal (MLJ)},
   volume = {81},
   number = {1},
   url = {http://hal3.name/docs/#daume10conjugate}
}
   


Active Online Multitask LearningAbstract     In this paper, we propose an online multitask learning framework where the weight vectors are updated in an adaptive fashion based on inter-task relatedness. Our work is in contrast with the earlier work on online multitask learning (Cavallanti et al., 2008) where the authors use a fixed interaction matrix of tasks to derive (fixed) update rules for all the tasks. In this work, we propose to update this interaction matrix itself in an adaptive fashion so that the weight vector updates are no longer fixed but are instead adaptive. Our framework can be extended to an active learning setting where the informativeness of an incoming instance across all the tasks can be evaluated using this adaptive interaction matrix. Empirical results on standardized datasets show improved performance in terms of accuracy, label complexity and number of mistakes made.
Avishek Saha, Piyush Rai, Hal Daumé III, Suresh Venkatasubramanian
ICML 2010 Workshop on Budgeted Learning (Budget), 2010@inproceedings{daume10aoml,
   author = {Avishek Saha and Piyush Rai and Hal {Daum\'e III} and Suresh Venkatasubramanian},
   title = {Active Online Multitask Learning},
   booktitle = {ICML 2010 Workshop on Budgeted Learning (Budget)},
   year = {2010},
   address = {Haifa, Israel},
}


Exploiting Tag and Word Correlations for Improved Webpage ClusteringAbstract     Automatic clustering of webpages helps a number of information retrieval tasks, such as improving user interfaces, collection clustering, introducing diversity in search results, etc. Typically, webpage clustering algorithms only use features extracted from the page-text. However, the advent of social-bookmarking websites, such as StumbleUpon1 and Delicious, has led to a huge amount of user-generated content such as the tag information that is associated with the webpages. In this paper, we present a subspace based feature extraction approach which leverages tag information to complement the page-contents of a webpage to extract highly discriminative features, with the goal of improved clustering performance. In our approach, we consider page-text and tags as two separate views of the data, and learn a shared subspace that maximizes the correlation between the two views. Any clustering algorithm can then be applied in this subspace. We compare our subspace based approach with a number of baselines that use tag information in various other ways, and show that the subspace based approach leads to improved performance on the webpage clustering task. Although our results here are on the webpage clustering task, the same approach can be used for webpage classification as well. In the end, we also suggest possible future work for leveraging tag information in webpage clustering, especially when tag information is present for not all, but only for a small number of webpages.
Anusua Trivedi, Piyush Rai, Scott L. DuVall, Hal Daumé III
CIKM Workshop on Search and Mining User-generated Contents (SMUC), 2010@inproceedings{daume10clustering,
   author = {Anusua Trivedi and Piyush Rai and Scott L. DuVall and Hal {Daum\'e III}},
   title = {Exploiting Tag and Word Correlations for Improved Webpage Clustering},
   booktitle = {Proceedings of {CIKM} Workshop on Search and Mining User-generated Contents (SMUC)},
   year = {2010},
   address = {Toronto, Canada},
}


Extracting Multilingual Topics from Unaligned CorporaAbstract     Topic models have been studied extensively in the context of monolingual corpora. Though there are some attempts to mine topical structure from cross-lingual corpora, they require clues about document alignments. In this paper we present a generative model called JointLDA which uses a bilingual dictionary to mine multilingual topics from an unaligned corpus. Experiments conducted on different data sets confirm our conjecture that jointly modeling the cross-lingual corpora offers several advantages compared to individual monolingual models. Since the JointLDA model merges related topics in different languages into a single multilingual topic: a) it can fit the data with relatively fewer topics. b) it has the ability to predict related words from a language different than that of the given document. In fact it has better predictive power compared to the bag-of-word based translation model leaving the possibility for JointLDA to be preferred over bag-of-word model for cross-lingual IR applications. We also found that the monolingual models learnt while optimizing the cross-lingual copora are more effective than the corresponding LDA models.
Jagadeesh Jagarlamudi, Hal Daumé III
European Conference on Information Retrieval (ECIR), 2010@InProceedings{daume10multilingual,
   author = {Jagadeesh Jagarlamudi and Hal {Daum\'e III}},
   title = {Extracting Multilingual Topics from Unaligned Corpora},
   booktitle = {Proceedings of the European Conference on Information Retrieval (ECIR)},
   year = {2010},
   address = {Milton Keynes, United Kingdom},
   url = {http://hal3.name/docs/#daume10multilingual}
}


Sketch Techniques for Scaling Distributional Similarity to the WebAbstract     In this paper, we propose a memory, space, and time efficient framework to scale distributional similarity to the web. We exploit sketch techniques, especially the Count-Min sketch, which approximates the frequency of an item in the corpus without explicitly storing the item itself. These methods use hashing to deal with massive amounts of the streaming text. We store all item counts computed from 90 GB of web data in just 2 billion counters (8 GB main memory) of CM sketch. Our method returns semantic similarity between word pairs in O(K) time and can compute similarity between any word pairs that are stored in the sketch. In our experiments, we show that our framework is as effective as using the exact counts.
Amit Goyal, Jagadeesh Jagarlamudi, Hal Daumé III, Suresh Venkatasubramanian
GEometrical Models of Natural Language Semantics Workshop (GEMS) at ACL, 2010@InProceedings{daume10distsim,
   author = {Amit Goyal and Jagadeesh Jagarlamudi and Hal {Daum\'e III} and Suresh Venkatasubramanian},
   title = {Sketch Techniques for Scaling Distributional Similarity to the Web},
   booktitle = {GEometrical Models of Natural Language Semantics Workshop (GEMS) at ACL},
   year = {2010},
   address = {Uppsala, Sweden},
   url = {http://hal3.name/docs/#daume09mrtf}
}
   


Automatically Producing Plot Unit Representations for Narrative Text
Amit Goyal, Ellen Riloff, Hal Daumé III
Empirical Methods in Natural Language Processing (EMNLP), 2010@InProceedings{daume10plotunits-emnlp,
   author = {Amit Goyal and Ellen Riloff and Hal {Daum\'e III}},
   title = {Automatically Producing Plot Unit Representations for Narrative Text},
   booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
   year = {2010},
   address = {Boston, MA},
   url = {http://hal3.name/docs/#daume10daal}
}


Kernelized Sorting for Natural Language ProcessingAbstract     Kernelized sorting is an approach for matching objects from two sources (or domains) that does not require any prior notion of similarity between objects across the two sources. Unfortunately, this technique is highly sensitive to initialization and high dimensional data. We present variants of kernelized sorting to increase its robustness and performance on several Natural Language Processing (NLP) tasks: document matching from parallel and comparable corpora, machine transliteration and even image processing. Empirically we show that, on these tasks, a semi-supervised variant of kernelized sorting outperforms matching canonical correlation analysis.
Jagadeesh Jagarlamudi, Seth Juarez, Hal Daumé III
Conference on Artificial Intelligence (AAAI), 2010@InProceedings{daume10sorting,
   author = {Jagadeesh Jagarlamudi and Seth Juarez and Hal {Daum\'e III}},
   title = {Kernelized Sorting for Natural Language Processing},
   booktitle = {Proceedings of the Conference on Artificial Intelligence (AAAI)},
   year = {2010},
   address = {Atlanta, Georgia},
   url = {http://hal3.name/docs/#daume10sorting}
}


Toward Plot Units: Automatic Affect State AnalysisAbstract     We present a system called AESOP that automatically produces affect states associated with characters in a story. This research represents a first step toward the automatic generation of plot unit structures from text. AESOP incorporates several existing sentiment analysis tools and lexicons to evaluate the effectiveness of current sentiment technology on this task. AESOP also includes two novel components: a method for acquiring patient polarity verbs, which impart negative affect on their patients, and affect projection rules to propagate affect tags from surrounding words onto the characters in the story. We evaluate AESOP on a small collection of fables.
Amit Goyal, Ellen Riloff, Hal Daumé III, Nathan Gilbert
HLT/NAACL Workshop on Computational Approaches to Analysis and Generation of Emotion in Text (CAET), 2010@InProceedings{daume10plotunits,
   author = {Amit Goyal and Ellen Riloff and Hal {Daum\'e III} and Nathan Gilbert},
   title = {Toward Plot Units: Automatic Affect State Analysis},
   booktitle = {Proceedings of HLT/NAACL Workshop on Computational Approaches to Analysis and Generation of Emotion in Text (CAET)},
   year = {2010},
   address = {Los Angeles, CA},
   url = {http://hal3.name/docs/#daume10plotunits}
}


Domain Adaptation meets Active LearningAbstract     In this work, we show how active learning in some (target) domain can leverage information from a different but related (source) domain. We present an algorithm that harnesses the source domain data to learn the best possible initializer hypothesis for doing active learning in the target domain, resulting in improved label complexity. We also present a variant of this algorithm which additionally uses the domain divergence information to selectively query the most informative points in the target domain, leading to further reductions in label complexity. Experimental results on a variety of datasets establish the efficacy of the proposed methods.
Piyush Rai, Avishek Saha, Hal Daumé III, Suresh Venkatasubramanian
HLT/NAACL Workshop on Active Learning for NLP (ALNLP), 2010@InProceedings{daume10daal,
   author = {Piyush Rai and Avishek Saha and Hal {Daum\'e III} and Suresh Venkatasubramanian},
   title = {Domain Adaptation meets Active Learning},
   booktitle = {Proceedings of HLT/NAACL Workshop on Active Learning for NLP (ALNLP)},
   year = {2010},
   address = {Los Angeles, CA},
   url = {http://hal3.name/docs/#daume10daal}
}


Infinite Predictor Subspace Models for Multitask LearningAbstract     Given several related learning tasks, we propose a nonparametric Bayesian model that captures task relatedness by assuming that the task parameters (i.e., predictors) share a latent subspace. More specifically, the intrinsic dimensionality of the task subspace is not assumed to be known a priori. We use an infinite latent feature model to automatically infer this number (depending on and limited by only the number of tasks). Furthermore, our approach is applicable when the underlying task parameter subspace is inherently sparse, drawing parallels with l1 regularization and LASSO-style models. We also propose an augmented model which can make use of (labeled, and additionally unlabeled if available) inputs to assist learning this subspace, leading to further improvements in the performance. Experimental results demonstrate the efficacy of both the proposed approaches, especially when the number of examples per task is small. Finally, we discuss an extension of the proposed framework where a nonparametric mixture of linear subspaces can be used to learn a nonlin- ear manifold over the task parameters, and also deal with the issue of negative transfer from unrelated tasks.
Piyush Rai, Hal Daumé III
Conference on Artificial Intelligence and Statistics (AI-Stats), 2010@InProceedings{daume10subspace,
   author = {Piyush Rai and Hal {Daum\'e III}},
   title = {Infinite Predictor Subspace Models for Multitask Learning},
   booktitle = {Proceedings of the Conference on Artificial Intelligence and Statistics (AI-Stats)},
   year = {2010},
   address = {Sardinia, Italy},
   url = {http://hal3.name/docs/#daume10subspace}
}


Sketching Techniques for Large Scale NLPAbstract     In this paper, we address the challenges posed by large amounts of text data by exploiting the power of hashing in the context of streaming data. We explore sketch techniques, especially the Count- Min Sketch, which approximates the frequency of a word pair in the corpus without explicitly storing the word pairs themselves. We use the idea of a conservative update with the Count-Min Sketch to reduce the average relative error of its approximate counts by a factor of two. We show that it is possible to store all words and word pairs counts computed from 37 GB of web data in just 2 billion counters (8 GB RAM). The number of these counters is up to 30 times less than the stream size which is a big memory and space gain. In Semantic Orientation experiments, the PMI scores computed from 2 billion counters are as effective as exact PMI scores.
Amit Goyal, Jagadeesh Jagarlamudi, Hal Daumé III, Suresh Venkatasubramanian
HLT/NAACL Workshop on the Web as a Corpus (WAC), 2010@InProceedings{daume10sketch,
   author = {Amit Goyal and Jagadeesh Jagarlamudi and Hal {Daum\'e III} and Suresh Venkatasubramanian},
   title = {Sketching Techniques for Large Scale {NLP}},
   booktitle = {Proceedings of HLT/NAACL Workshop on the Web as a Corpus (WAC)},
   year = {2010},
   address = {Los Angeles, CA},
   url = {http://hal3.name/docs/#daume10sketch}
}


2009

Multitask Learning using Nonparametrically Learned Predictor SubspacesAbstract     Given several related learning tasks, we propose a nonparametric Bayesian learning model that captures task relatedness by assuming that the task parameters (i.e., weight vectors) share a latent subspace. More specifically, the intrinsic dimensionality of this subspace is not assumed to be known a priori. We use an infinite latent feature model - the Indian Buffet Process - to automatically infer this number. We also propose extensions of this model where the subspace learning can incorporate (labeled, and additionally unlabeled if available) examples, or the task parameters share a mixture of subspaces, instead of sharing a single subspace. The latter property can allow learning nonlinear manifold structure underlying the task parameters, and can also help in preventing negative transfer from outlier tasks.
Piyush Rai, Hal Daumé III
NIPS Workshop on Learning from Multiple Sources, 2009@InProceedings{daume09subspacemtl,
   author = {Piyush Rai and Hal {Daum\'e III}},
   title = {Multitask Learning using Nonparametrically Learned Predictor Subspaces},
   booktitle = {NIPS Workshop on Learning from Multiple Sources},
   year = {2009},
   address = {Whistler, Canada},
   url = {http://hal3.name/docs/#daume09subspacemtl}
}


Semi-supervised or Semi-unsupervised?
Hal Daumé III
Unpublished, 2009@Misc{daume09sslnlp,
   author = {Hal {Daum\'e III}},
   title = {Semi-supervised or Semi-unsupervised?},
   howpublished = {Invited paper: NAACL-HLT Workshop on Semi-supervised Learning in NLP (SSLNLP)},
   year = {2009},
   address = {Boulder, CO},
   url = {http://hal3.name/docs/#daume09sslnlp}
}


Search-based Structured PredictionAbstract     We present Searn, an algorithm for integrating search and learning to solve complex structured prediction problems such as those that occur in natural language, speech, computational biology, and vision. Searn is a meta-algorithm that transforms these complex problems into simple classification problems to which any binary classifier may be applied. Unlike current algorithms for structured learning that require decomposition of both the loss function and the feature functions over the predicted structure, Searn is able to learn prediction functions for any loss function and any class of features. Moreover, Searn comes with a strong, natural theoretical guarantee: good performance on the derived classification problems implies good performance on the structured prediction problem.
Hal Daumé III, John Langford, Daniel Marcu
Machine Learning Journal (MLJ), 2009@article{daume09searn,
   author = {Hal {Daum\'e III} and John Langford and Daniel Marcu},
   title = {Search-based Structured Prediction},
   year = {2009},
   booktitle = {Machine Learning Journal (MLJ)},
   url = {http://hal3.name/docs/#daume09searn}
}
   


Factor Regression Combining Heterogeneous Sources of InformationAbstract     We present a non-parametric Bayesian factor regression model that combines two heterogeneous sources of information: gene expression arrays and text from their corresponding PubMed abstracts. Our model approximates a pLSI style model and results in improved regression accuracy. We apply this model to gene-expression data analysis, but it is extendable to other problems exhibiting a similar heterogeneous multiplicity in sources of information, like financial analysis, weather prediction and others.
Amrish Kapoor, Piyush Rai, Hal Daumé III
NIPS Workshop on Learning From Multiple Sources with Applications to Robotics (LMS), 2009@InProceedings{daume09hetero,
   author = {Amrish Kapoor and Piyush Rai and Hal {Daum\'e III}},
   title = {Factor Regression Combining Heterogeneous Sources of Information},
   booktitle = {Proceedings of NIPS Workshop on Learning From Multiple Sources with Applications to Robotics (LMS)},
   year = {2009},
   address = {Vancouver, Canada},
   url = {http://hal3.name/docs/#daume09hetero}
}


Streamed Learning: One-Pass SVMsAbstract     We present a streaming model for large-scale classification (in the context of l2-SVM) by leveraging connections between learning and computational geometry. The streaming model imposes the constraint that only a single pass over the data is allowed. The l2-SVMis known to have an equivalent formulation in terms of theminimumenclosing ball (MEB) problem, and an efficient algorithm based on the idea of core sets exists (CVM) [Tsang et al., 2005]. CVM learns a (1+ε)-approximate MEB for a set of points and yields an approximate solution to corresponding SVM instance. However CVM works in batch mode requiringmultiple passes over the data. This paper presents a single-pass SVM which is based on the minimum enclosing ball of streaming data. We show that the MEB updates for the streaming case can be easily adapted to learn the SVM weight vector in a way similar to using online stochastic gradient updates. Our algorithmperforms polylogarithmic computation at each example, and requires very small and constant storage. Experimental results show that, even in such restrictive settings, we can learn efficiently in just one pass and get accuracies comparable to other stateof- the-art SVM solvers (batch and online). We also give an analysis of the algorithm, and discuss some open issues and possible extensions.
Piyush Rai, Hal Daumé III, Suresh Venkatasubramanian
International Joint Conference on Artificial Intelligence (IJCAI), 2009@InProceedings{daume09onepass,
   author = {Piyush Rai and Hal {Daum\'e III} and Suresh Venkatasubramanian},
   title = {Streamed Learning: One-Pass {SVM}s},
   booktitle = {International Joint Conference on Artificial Intelligence (IJCAI)},
   year = {2009},
   address = {Pasadena, CA},
   url = {http://hal3.name/docs/#daume09onepass}
}
   


Multi-Label Prediction via Sparse Infinite CCAAbstract     Canonical Correlation Analysis (CCA) is a useful technique for modeling dependencies between two (or more) sets of variables. Building upon the recently suggested probabilistic interpretation of CCA, we propose a nonparametric, fully Bayesian framework that can automatically select the number of correlation components, and effectively capture the sparsity underlying the projections. In addition, given (partially) labeled data, our algorithm can also be used as a (semi)supervised dimensionality reduction technique, and can be applied to learn useful predictive features in the context of learning a set of related tasks. Experimental results demonstrate the efficacy of the proposed approach for both CCA as a stand-alone problem, and when applied to multi-label prediction.
Piyush Rai, Hal Daumé III
Conference on Neural Information Processing Systems (NIPS), 2009@InProceedings{daume09cca,
   author = {Piyush Rai and Hal {Daum\'e III}},
   title = {Multi-Label Prediction via Sparse Infinite {CCA}},
   booktitle = {Proceedings of the Conference on Neural Information Processing Systems (NIPS)},
   year = {2009},
   address = {Vancouver, Canada},
   url = {http://hal3.name/docs/#daume09cca}
}


Unsupervised Part of Speech Tagging Without a Lexicon
Adam R. Teichert, Hal Daumé III
NIPS Workshop on Grammar Induction, Representation of Language and Language Learning (GIRLLL), 2009@InProceedings{daume09typpos,
   author = {Adam R. Teichert and Hal {Daum\'e III}},
   title = {Unsupervised Part of Speech Tagging Without a Lexicon},
   booktitle = {NIPS Workshop on Grammar Induction, Representation of Language and Language Learning (GIRLLL)},
   year = {2009},
   address = {Vancouver, Canada},
   url = {http://hal3.name/docs/#daume09typpos}
}


Fast Search for Infinite Latent Feature ModelsAbstract     We propose several search based alternatives for inference in the Indian Buffet Process (IBP) based models. We consider the case when we only want a maximum a posteriori (MAP) estimate of the latent feature assignment matrix. If true posterior samples are required, these MAP estimates can also serve as intelligent initializers for MCMC based algorithms. Another advantage of the proposed methods is that they can process one observation at a time making it possible to do inference in an online setting. Experimental evidences suggest that these algorithms can give us computational benefits of an order of magnitude over Gibbs sampling (or its sequential variant - the particle filter) traditionally used in IBP based models.
Piyush Rai, Hal Daumé III
NIPS Workshop on Non-parametric Bayes (NP-Bayes), 2009@InProceedings{daume09ibpsearch,
   author = {Piyush Rai and Hal {Daum\'e III}},
   title = {Fast Search for Infinite Latent Feature Models},
   booktitle = {Proceedings of NIPS Workshop on Non-parametric Bayes (NP-Bayes)},
   year = {2009},
   address = {Vancouver, Canada},
   url = {http://hal3.name/docs/#daume09ibpsearch}
}


Exponential Family Hybrid Semi-Supervised LearningAbstract     We present an approach to semi-supervised learning based on an exponential family characterization. Our approach generalizes previous work on coupled priors for hybrid generative/discriminative models. Our model is more flexible and natural than previous approaches. Experimental results on several data sets show that our approach also performs better in practice.
Arvind Agarwal, Hal Daumé III
International Joint Conference on Artificial Intelligence (IJCAI), 2009@InProceedings{daume09hybrid,
   author = {Arvind Agarwal and Hal {Daum\'e III}},
   title = {Exponential Family Hybrid Semi-Supervised Learning},
   booktitle = {International Joint Conference on Artificial Intelligence (IJCAI)},
   year = {2009},
   address = {Pasadena, CA},
   url = {http://hal3.name/docs/#daume09hybrid}
}
   


Markov Random Topic FieldsAbstract     Most approaches to topic modeling assume an independence between documents that is frequently violated. We present an topic model that makes use of one or more user-specified graphs describing relationships between documents. These graph are encoded in the form of a Markov random field over topics and serve to encourage related documents to have similar topic structures. Experiments on show upwards of a $10\%$ improvement in modeling performance.
Hal Daumé III
Association for Computational Linguistics (ACL), 2009@InProceedings{daume09mrtf,
   author = {Hal {Daum\'e III}},
   title = {Markov Random Topic Fields},
   booktitle = {Association for Computational Linguistics (ACL)},
   year = {2009},
   address = {Singapore},
   url = {http://hal3.name/docs/#daume09mrtf}
}
   


Unsupervised Search-based Structured PredictionAbstract     We describe an adaptation and application of a search-based structured prediction algorithm "Searn" to unsupervised learning problems. We show that it is possible to reduce unsupervised learning to supervised learning and demonstrate a high-quality unsupervised shift-reduce parsing model. We additionally show a close connection between unsupervised Searn and expectation maximization. Finally, we demonstrate the efficacy of a semi-supervised extension. The key idea that enables this is an application of the predict-self idea for unsupervised learning.
Hal Daumé III
International Conference on Machine Learning (ICML), 2009@InProceedings{daume09unsearn,
   author = {Hal {Daum\'e III}},
   title = {Unsupervised Search-based Structured Prediction},
   booktitle = {International Conference on Machine Learning (ICML)},
   year = {2009},
   address = {Montreal, Canada},
   url = {http://hal3.name/docs/#daume09unsearn}
}
   


A Bayesian Statistics Approach to Multiscale Coarse GrainingAbstract     Coarse-grained (CG) modeling provides a promising way to investigate many important physical and biological phenomena over large spatial and temporal scales. The multiscale coarse-graining (MS-CG) method has been proven to be a thermodynamically consistent way to systematically derive a CG model from atomistic force information, as shown in a variety of systems, ranging from simple liquids to proteins embedded in lipid bilayers. In the present work, Bayes' theorem, an advanced statistical tool widely used in signal processing and pattern recognition, is adopted to further improve the MS-CG force field obtained from the CG modeling. This approach can regularize the linear equation resulting from the underlying force-matching methodology, therefore substantially improving the quality of the MS-CG force field, especially for the regions with limited sampling. Moreover, this Bayesian approach can naturally provide an error estimation for each force field parameter, from which one can know the extent the results can be trusted. The robustness and accuracy of the Bayesian MS-CG algorithm is demonstrated for three different systems, including simple liquid methanol, polyalanine peptide solvated in explicit water, and a much more complicated peptide assembly with 32 NNQQNY hexapeptides.
Pu Liu, Qiang Shi, Hal Daumé III, Gregory Voth
Journal of Chemical Physics (J.ChPhys), 2009@Article{daume09graining,
   author = {Pu Liu and Qiang Shi and Hal {Daum\'e III} and Gregory Voth},
   title = {A Bayesian Statistics Approach to Multiscale Coarse Graining},
   journal = {Journal of Chemical Physics (J.ChPhys)},
   year = {2009},
   volume = {129},
   number = {21},
   pages = {214114},
   month = {December},
}


Bayesian Multitask Learning with Latent HierarchiesAbstract     We learn multiple hypotheses for related tasks under a latent hierarchical relationship between tasks. We exploit the intuition that for \emphdomain adaptation, we wish to share classifier structure, but for \emphmultitask learning, we wish to share covariance structure. Our hierarchical model is seen to subsume several previously proposed multitask learning models and performs well on three distinct real-world data sets.
Hal Daumé III
Conference on Uncertainty in Artificial Intelligence (UAI), 2009@InProceedings{daume09hiermtl,
   author = {Hal {Daum\'e III}},
   title = {Bayesian Multitask Learning with Latent Hierarchies},
   booktitle = {Conference on Uncertainty in Artificial Intelligence (UAI)},
   year = {2009},
   address = {Montreal, Canada},
   url = {http://hal3.name/docs/#daume09hiermtl}
}
   


Streaming for Large Scale NLP: Language ModelingAbstract     In this paper, we explore a streaming algorithm paradigm to handle large amounts of data for NLP problems. We present an efficient low-memory method for constructing high-order approximate n-gram frequency counts. The method is based on a deterministic streaming algorithm which efficiently computes approximate frequency counts over a stream of data while employing a small memory footprint. We show that this method easily scales to billion-word monolingual corpora using a conventional (4 GB RAM) desktop machine. Statistical machine translation experimental results corroborate that the resulting high-n approximate small language model is as effective as models obtained from other count pruning methods.
Amit Goyal, Hal Daumé III, Suresh Venkatasubramanian
North American Chapter of the Association for Computational Linguistics (NAACL), 2009@InProceedings{daume09streaming,
   author = {Amit Goyal and Hal {Daum\'e III} and Suresh Venkatasubramanian},
   title = {Streaming for Large Scale {NLP}: Language Modeling},
   booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)},
   year = {2009},
   address = {Boulder, CO},
   url = {http://hal3.name/docs/#daume09streaming}
}
   


Non-Parametric Bayesian Model Areal LinguisticsAbstract     We describe a statistical model over linguistic areas and phylogeny. Our model recovers known areas and identifies a plausible hierarchy of areal features. The use of areas improves genetic reconstruction of languages both qualitatively and quantitatively according to a variety of metrics. We model linguistic areas by a Pitman-Yor process and linguistic phylogeny by Kingman's coalescent.
Hal Daumé III
North American Chapter of the Association for Computational Linguistics (NAACL), 2009@InProceedings{daume09areal,
   author = {Hal {Daum\'e III}},
   title = {Non-Parametric {B}ayesian Model Areal Linguistics},
   booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)},
   year = {2009},
   address = {Boulder, CO},
   url = {http://hal3.name/docs/#daume09areal}
}
   


2008

The Infinite Hierarchical Factor Regression Model
Piyush Rai, Hal Daumé III
Conference on Neural Information Processing Systems (NIPS), 2008@InProceedings{daume08ihfrm,
   author = {Piyush Rai and Hal {Daum\'e III}},
   title = {The Infinite Hierarchical Factor Regression Model},
   booktitle = {Proceedings of the Conference on Neural Information Processing Systems (NIPS)},
   year = {2008},
   address = {Vancouver, Canada},
   url = {http://hal3.name/docs/#daume08ihfrm}
}


Name Translation in Statistical Machine Translation: Learning When to TransliterateAbstract     We present a method to transliterate names in the framework of end-to-end statistical machine translation. The system is trained to learn when to transliterate. For Ararbic to English MT, we developed and trained a transliterator on a bitext of 7 million sentences and Google's English terabyte ngrams and achieved better name translation accuracy than 3 out of 4 professional translators. The paper also includes a discussion of challenges in name translation evaluation.
Ulf Hermjakob, Kevin Knight, Hal Daumé III
Conference of the Association for Computational Linguistics (ACL), 2008@InProceedings{daume08transliterate,
   author = {Ulf Hermjakob and Kevin Knight and Hal {Daum\'e III}},
   title = {Name Translation in Statistical Machine Translation: Learning When to Transliterate},
   booktitle = {Conference of the Association for Computational Linguistics (ACL)},
   year = {2008},
   address = {Columbus, OH},
   url = {http://hal3.name/docs/#daume08transliterate}
}
   


Structure Compilation: Trading Structure for FeaturesAbstract     Structured models often achieve excellent performance but can be slow at test time. We investigate structure compilation, where we replace structure with features, which are often computationally simpler but unfortunately statistically more complex. We analyze this tradeoff theoretically and empirically on three natural language processing tasks. We also introduce a simple method to transfer predictive power from structure to features via unlabeled data, while incurring a minimal statistical penalty.
Percy Liang, Hal Daumé III, Dan Klein
International Conference on Machine Learning (ICML), 2008@InProceedings{daume08flat,
   author = {Percy Liang and Hal {Daum\'e III} and Dan Klein},
   title = {Structure Compilation: Trading Structure for Features},
   booktitle = {International Conference on Machine Learning (ICML)},
   year = {2008},
   address = {Helsinki, Finland},
   url = {http://hal3.name/docs/#daume08flat}
}
   


Cross-Task Knowledge-Constrained Self TrainingAbstract     We present an algorithmic framework for learning multiple related tasks. Our framework exploits a form of prior knowledge that relates the output spaces of these tasks. We present PAC learning results that analyze the conditions under which such learning is possible. We present results on learning a shallow parser and named-entity recognition system that exploits our framework, showing consistent improvements over baseline methods.
Hal Daumé III
Empirical Methods in Natural Language Processing (EMNLP), 2008@InProceedings{daume08hints,
   author = {Hal {Daum\'e III}},
   title = {Cross-Task Knowledge-Constrained Self Training},
   booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
   year = {2008},
   address = {Honolulu, Hawaii},
   url = {http://hal3.name/docs/#daume08hints}
}
   


Perceptron-based Coherence PredictorsAbstract     Coherence misses in shared-memory multiprocessors account for a substantial fraction of execution time in many important workloads. Just as branch predictors reduce the performance impact of branches, coherence predictors can reduce the performance impact of coherence misses. Two-level pattern-based coherence predictors have offered a general prediction method to trigger appropriate coherence actions. This paper presents the design and evaluation of a perceptron-based coherence predictor that extends a conventional directory-based write-invalidate protocol to predict when to push updates to remote nodes. When predicted correctly, the update eliminates a coherence miss on the remote node. We also present a simple mechanism for predicting to which nodes we should push updates. We evaluate our perceptron-based update predictor on a variety of SPLASH-2 and PARSEC benchmarks. Simulation indicates that the update predictor eliminates an average of 30\% of coherence misses. Our simple consumer prediction mechanism sent very few useless updates of updates were consumed (eliminated misses).
Devyani Ghosh, John Carter, Hal Daumé III
2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects (ICSA), 2008@InProceedings{daume08coherence,
   author = {Devyani Ghosh and John Carter and Hal {Daum\'e III}},
   title = {Perceptron-based Coherence Predictors},
   booktitle = {Proceedings of the 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects (ICSA)},
   year = {2008},
   address = {Beijing, China},
   url = {http://hal3.name/docs/#daume08coherence}
}
   


2007

A Bayesian Model for Discovering Typological ImplicationsAbstract     A standard form of analysis for linguistic typology is the universal implication. These implications state facts about the range of extant languages, such as "if objects come after verbs, then adjectives come after nouns." Such implications are typically discovered by painstaking hand analysis over a small sample of languages. We propose a computational model for assisting at this process. Our model is able to discover both well-known implications as well as some novel implications that deserve further study. Moreover, through a careful application of hierarchical analysis, we are able to cope with the well-known sampling problem: languages are not independent.
Hal Daumé III, Lyle Campbell
Conference of the Association for Computational Linguistics (ACL), 2007@InProceedings{daume07implication,
   author = {Hal {Daum\'e III} and Lyle Campbell},
   title = {A {B}ayesian Model for Discovering Typological Implications},
   booktitle = {Conference of the Association for Computational Linguistics (ACL)},
   year = {2007},
   address = {Prague, Czech Republic},
   url = {http://hal3.name/docs/#daume07implication}
}
   


Fast search for Dirichlet process mixture modelsAbstract     Dirichlet process (DP) mixture models provide a flexible Bayesian framework for density estimation. Unfortunately, their flexibility comes at a cost: inference in DP mixture models is computationally expensive, even when conjugate distributions are used. In the common case when one seeks only a maximum a posteriori assignment of data points to clusters, we show that search algorithms provide a practical alternative to expensive MCMC and variational techniques. When a true posterior sample is desired, the solution found by search can serve as a good initializer for MCMC. Experimental results show that using these techniques is it possible to apply DP mixture models to very large data sets.
Hal Daumé III
Eleventh International Conference on Artificial Intelligence and Statistics (AIStats), 2007@InProceedings{daume07astar-dp,
   author = {Hal {Daum\'e III}},
   title = {Fast search for Dirichlet process mixture models},
   booktitle = {Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (AIStats)},
   year = {2007},
   address = {San Juan, Puerto Rico},
   url = {http://hal3.name/docs/#daume07astar-dp}
}


Frustratingly Easy Domain AdaptationAbstract     We describe an approach to domain adaptation that is appropriate exactly in the case when one has enough "target" data to do slightly better than just using only "source" data. Our approach is incredibly simple, easy to implement as a preprocessing step (10 lines of Perl!) and outperforms state-of-the-art approaches on a range of datasets. The technique comes with several simple theoretical guarantees. Moreover, it is trivially extended to a multi-domain adaptation problem, where one has data from a variety of different domains.
Hal Daumé III
Conference of the Association for Computational Linguistics (ACL), 2007@InProceedings{daume07easyadapt,
   author = {Hal {Daum\'e III}},
   title = {Frustratingly Easy Domain Adaptation},
   booktitle = {Conference of the Association for Computational Linguistics (ACL)},
   year = {2007},
   address = {Prague, Czech Republic},


Bayesian Agglomerative Clustering with CoalescentsAbstract     We introduce a new Bayesian model for hierarchical clustering based on a prior over trees called Kingman's coalescent. We develop novel greedy and sequential Monte Carlo inferences which operate in a bottom-up agglomerative fashion. We show experimentally the superiority of our algorithms over others, and demonstrate our approach in document clustering and phylolinguistics.
Yee Whye Teh, Hal Daumé III, Daniel Roy
Conference on Neural Information Processing Systems (NIPS), 2007@InProceedings{daume07coalescent,
   author = {Yee Whye Teh and Hal {Daum\'e III} and Daniel Roy},
   title = {Bayesian Agglomerative Clustering with Coalescents},
   booktitle = {Proceedings of the Conference on Neural Information Processing Systems (NIPS)},
   year = {2007},
   address = {Vancouver, Canada},
   url = {http://hal3.name/docs/#daume07coalescent}
}


2006

Bayesian Query-Focused SummarizationAbstract     We present BayeSum (for "Bayesian summarization"), a model for sentence extraction in query-focused summarization. BayeSum leverages the common case in which multiple documents are relevant to a single query. Using these documents as reinforcement for query terms, BayeSum is not afflicted by the paucity of information in short queries. We show that approximate inference in BayeSum is possible on large data sets and results in a state-of-the-art summarization system. Furthermore, we show how BayeSum can be understood as a justified query expansion technique in the language modeling for IR framework.
Hal Daumé III, Daniel Marcu
Conference of the Association for Computational Linguistics (ACL), 2006@InProceedings{daume06bqfs,
   author = {Hal {Daum\'e III} and Daniel Marcu},
   title = {Bayesian Query-Focused Summarization},
   booktitle = {Proceedings of the Conference of the Association for Computational Linguistics (ACL)},
   year = {2006},
   address = {Sydney, Australia},
   url = {http://hal3.name/docs/#daume06bqfs}
}
   


Practical Structured Learning Techniques for Natural Language Processing
Hal Daumé III
Ph.D. Thesis, 2006@PhdThesis{daume06thesis,
   author = {Hal {Daum\'e III}},
   title = {Practical Structured Learning Techniques for Natural Language Processing},
   school = {University of Southern California},
   year = {2006},
   address = {Los Angeles, CA},
   month = {August},
   url = {http://hal3.name/docs/#daume06thesis}
}
   


Domain Adaptation for Statistical ClassifiersAbstract     The most basic assumption used in statistical learning theory is that training data and test data are drawn from the same underlying distribution. Unfortunately, in many applications, the "in-domain" test data is drawn from a distribution that is related, but not identical, to the "out-of-domain" distribution of the training data. We consider the common case in which labeled out-of-domain data is plentiful, but labeled in-domain data is scarce. We introduce a statistical formulation of this problem in terms of a simple mixture model and present an instantiation of this framework to maximum entropy classifiers and their linear chain counterparts. We present efficient inference algorithms for this special case based on the technique of conditional expectation maximization. Our experimental results show that our approach leads to improved performance on three real world tasks on four different data sets from the natural language processing domain.
Hal Daumé III, Daniel Marcu
Journal of Artificial Intelligence Research (JAIR), 2006@article{daume06megam,
   author = {Hal {Daum\'e III} and Daniel Marcu},
   title = {Domain Adaptation for Statistical Classifiers},
   journal = {Journal of Artificial Intelligence Research (JAIR)},
   year = {2006},
   volume = {26},
   pages = {101--126},
   url = {http://hal3.name/docs/#daume06megam}
}
   


Searn in PracticeAbstract     We recently introduced an algorithm, Searn, for solving hard structured prediction problems. This algorithm enjoys many nice properties: efficiency, wide applicability, theoretical justification and simplicity. However, under a desire to fit a lot of information into the original paper, it may not be so clear how simple the technique is. This report is designed to showcase how Searn can be applied to a wide variety of techniques and what really goes on behind the scenes. We will make use of three example problems, ranging from simple to complex. These are: (1) sequence labeling, (2) parsing and (3) machine translation. (These were chosen to be as widely understandable, especially in the NLP community, as possible.) In the end, we will come back to discuss Searn for general problems.
Hal Daumé III, John Langford, Daniel Marcu
Unpublished, 2006@unpublished{daume06searn-practice,
   author = {Hal {Daum\'e III} and John Langford and Daniel Marcu},
   title = {Searn in Practice},
   year = {2006},
   url = {http://hal3.name/docs/#daume06searn-practice}
}
   


2005

A Bayesian Model for Supervised Clustering with the Dirichlet Process PriorAbstract     We develop a Bayesian framework for tackling the supervised clustering problem, the generic problem encountered in problems such as reference matching, coreference resolution, identity uncertainty and record linkage. Our clustering model is based on the non-parametric Dirichlet process prior, which enables us to define distributions over the countably infinite sets that naturally arise in this problem. We add \emphsupervision to our model by positing the existence of a set of unobserved random variables (we call these "reference types") that are generic across all clusters. Inference in our framework, which require integrating over infinitely many parameters, is solved using Markov chain Monte Carlo techniques. We present algorithms for both conjugate and non-conjugate priors. We present a simple -- but general -- parameterization of our model based on a Gaussian assumption. We evaluate this model on one artificial task and three real-world tasks, comparing it against both unsupervised and state-of-the-art supervised algorithms. Our results show that our model is able to outperform other models for this task across a variety of performance metrics.
Hal Daumé III, Daniel Marcu
Journal of Machine Learning Research (JMLR), 2005@Article{daume05dpscm,
   author = {Hal {Daum\'e III} and Daniel Marcu},
   title = {A {B}ayesian Model for Supervised Clustering with the {D}irichlet Process Prior},
   journal = {Journal of Machine Learning Research (JMLR)},
   year = {2005},
   month = {September},
   volume = {6},
   pages = {1551--1577},
   url = {http://hal3.name/docs/#daume05dpscm}
}


Bayesian Multi-Document Summarization at MSEAbstract     We describe our entry into the Multilingual Summarization Evaluation (MSE) competition for evaluating generic multi-document summarization systems, where documents are drawn both from English data and English translations of Arabic data. Our system is based on a Bayesian Query-Focused Summarization model, adapted to the generic, multi-document setting and tuned against the \textscRouge evaluation metric. In the human pyramid-based evaluation, our system scored an average of $0.530$, approximately $8\%$ better than the next best system, which scored $0.489$. In the automatic evaluation, our system scored $0.157$ (behind four other sites) with the skip-bigram evaluation, and $0.131$ (behind two other sites) with the standard bigram evaluation.
Hal Daumé III, Daniel Marcu
Workshop on Multilingual Summarization Evaluation (MSE), 2005@InProceedings{daume05mse,
   author = {Hal {Daum\'e III} and Daniel Marcu},
   title = {Bayesian Multi-Document Summarization at MSE},
   booktitle = {Proceedings of the Workshop on Multilingual Summarization Evaluation (MSE)},
   year = {2005},
   address = {Ann Arbor, MI},
   month = {June 29},
   url = {http://hal3.name/docs/#daume05mse}
}


A Large-Scale Exploration of Effective Global Features for a Joint Entity Detection and Tracking ModelAbstract     Entity detection and tracking (EDT) is the task of identifying textual mentions of real-world entities in documents, extending the named entity detection and coreference resolution task by considering mentions other than names (pronouns, definite descriptions, etc.). Like NE tagging and coreference resolution, most solutions to the EDT task separate out the mention detection aspect from the coreference aspect. By doing so, these solutions are limited to using only local features for learning. In contrast, by modeling both aspects of the EDT task simultaneously, we are able to learn using highly complex, non-local features. We develop a new joint EDT model and explore the utility of many features, demonstrating their effectiveness on this task.
Hal Daumé III, Daniel Marcu
Joint Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT/EMNLP), 2005@InProceedings{daume05coref,
   author = {Hal {Daum\'e III} and Daniel Marcu},
   title = {A Large-Scale Exploration of Effective Global Features for a Joint Entity Detection and Tracking Model},
   booktitle = {Joint Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT/EMNLP)},
   year = {2005},
   address = {Vancouver, Canada},
   url = {http://hal3.name/docs/#daume05coref}
}
   


Learning as Search Optimization: Approximate Large Margin Methods for Structured PredictionAbstract     Mappings to structured output spaces (strings, trees, partitions, etc.) are typically learned using extensions of classification algorithms to simple graphical structures (eg., linear chains) in which search and parameter estimation can be performed exactly. Unfortunately, in many complex problems, it is rare that exact search or parameter estimation is tractable. Instead of learning exact models and searching via heuristic means, we embrace this difficulty and treat the structured output problem in terms of approximate search. We present a framework for learning as search optimization, and two parameter updates with convergence theorems and bounds. Empirical evidence shows that our integrated approach to learning and decoding can outperform exact models at smaller computational cost.
Hal Daumé III, Daniel Marcu
International Conference on Machine Learning (ICML), 2005@InProceedings{daume05laso,
   author = {Hal {Daum\'e III} and Daniel Marcu},
   title = {Learning as Search Optimization: Approximate Large Margin Methods for Structured Prediction},
   booktitle = {International Conference on Machine Learning (ICML)},
   year = {2005},
   address = {Bonn, Germany},
   url = {http://hal3.name/docs/#daume05laso}
}
   


Bayesian Summarization at DUC and a Suggestion for Extrinsic EvaluationAbstract     We describe our entry into the Document Understanding Conference competition for evaluating query-focused multi-document summarization systems. Our system is based on a Bayesian Query-Focused Summarization model, similar to the system we entered into the MSE competition. This paper begins by describing the (few) differences between our DUC system and our MSE system and describes our placement in the competition. The remainder of this paper argues in favor of performing \emphextrinsic evaluation of summarization systems, and suggests a method for doing so.
Hal Daumé III, Daniel Marcu
Document Understanding Conference (DUC), 2005@InProceedings{daume05duc,
   author = {Hal {Daum\'e III} and Daniel Marcu},
   title = {Bayesian Summarization at DUC and a Suggestion for Extrinsic Evaluation},
   booktitle = {Proceedings of the Document Understanding Conference (DUC)},
   year = {2005},
   address = {Vancouver, B.C., Canada},
   month = {October 9--10},
   url = {http://hal3.name/docs/#daume05duc}
}


Induction of Word and Phrase Alignments for Automatic Document SummarizationAbstract     Current research in automatic single document summarization is dominated by two effective, yet na\"ive approaches: summarization by sentence extraction, and headline generation via bag-of-words models. While successful in some tasks, neither of these models is able to adequately capture the large set of linguistic devices utilized by humans when they produce summaries. One possible explanation for the widespread use of these models is that good techniques have been developed to extract appropriate training data for them from existing document/abstract and document/headline corpora. We believe that future progress in automatic summarization will be driven both by the development of more sophisticated, linguistically informed models, as well as a more effective leveraging of document/abstract corpora. In order to open the doors to simultaneously achieving both of these goals, we have developed techniques for automatically producing word-to-word and phrase-to-phrase \emphalignments between documents and their human-written abstracts. These alignments make explicit the correspondences that exist in such document/abstract pairs, and create a potentially rich data source from which complex summarization algorithms may learn. This paper describes experiments we have carried out to analyze the ability of \emphhumans to perform such alignments, and based on these analyses, we describe experiments for creating them automatically. Our model for the alignment task is based on an extension of the standard hidden Markov model, and learns to create alignments in a completely unsupervised fashion. We describe our model in detail and present experimental results that show that our model is able to learn to reliably identify word- and phrase-level alignments in a corpus of \docabs\ pairs.
Hal Daumé III, Daniel Marcu
Computational Linguistics (CL), 2005@Article{daume05alignments,
   author = {Hal {Daum\'e III} and Daniel Marcu},
   title = {Induction of Word and Phrase Alignments for Automatic Document Summarization},
   journal = {Computational Linguistics (CL)},
   year = {2005},
   month = {December},
   volume = {31},
   number = {4},
   pages = {505--530},
   url = {http://hal3.name/docs/#daume05alignments}
}


Search-Based Structured Prediction as ClassificationAbstract     Solutions to computationally hard problems often require that search be used. Integrating search into the learning phase has been previously proposed in an ad-hoc manner (Daume & Marcu, 2005). In this paper, we show that structured prediction can be mapped into a search setting using language from reinforcement learning, and known techniques for reinforcement learning (Langford et al., 2005) can give formal performance bounds on the structured prediction task.
Hal Daumé III, John Langford, Daniel Marcu
NIPS Workshop on Advances in Structured Learning for Text and Speech Processing (ASLTSP), 2005@InProceedings{daume05search,
   author = {Hal {Daum\'e III} and John Langford and Daniel Marcu},
   title = {Search-Based Structured Prediction as Classification},
   booktitle = {NIPS Workshop on Advances in Structured Learning for Text and Speech Processing (ASLTSP)},
   year = {2005},
   address = {Whistler, Canada},
   url = {http://hal3.name/docs/#daume05search}
}
   


2004

A Phrase-Based HMM Approach to Document/Abstract AlignmentAbstract     We describe a model for creating word-to-word and phrase-to-phrase alignments between documents and their human written abstracts. Such alignments are critical for the development of statistical summarization systems that can be trained on large corpora of document/abstract pairs. Our model, which is based on a novel Phrase-Based HMM, outperforms both the Cut \& Paste alignment model \citejing:cl and models developed in the context of machine translation \citebrownetal93.
Hal Daumé III, Daniel Marcu
Empirical Methods in Natural Language Processing (EMNLP), 2004@InProceedings{daume04pbhmm,
   author = {Hal {Daum\'e III} and Daniel Marcu},
   title = {A Phrase-Based {HMM} Approach to Document/Abstract Alignment},
   booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
   year = {2004},
   address = {Barcelona, Spain},
   url = {http://hal3.name/docs/#daume04pbhmm}
}


Book Review: Automatic Summarization (I. Mani)
Hal Daumé III
Unpublished, 2004@Misc{daume04mani,
   author = {Hal {Daum\'e III}},
   title = {Book Review: Automatic Summarization (I. Mani)},
   year = {2004},
   howpublished = {Machine Translation Journal}
   keywords = {nlp},
   url = {http://hal3.name/docs/#daume04mani}
}


A Tree-Position Kernel for Document CompressionAbstract     We describe our entry into the DUC 2004 automatic document summarization competition. We competed only in the single document, headline generation task. Our system is based on a novel kernel dubbed the tree position kernel, combined with two other well-known kernels. Our system performs well on white-box evaluations, but does very poorly in the overall DUC evaluation. However, the latter results are offset by the fact that baseline systems consistently outperform well engineered systems.
Hal Daumé III, Daniel Marcu
Fourth Document Understanding Conference (DUC), 2004@InProceedings{daume04treeposition,
   author = {Hal {Daum\'e III} and Daniel Marcu},
   title = {A Tree-Position Kernel for Document Compression},
   booktitle = {Proceedings of the Fourth Document Understanding Conference (DUC)},
   year = {2004},
   address = {Boston, MA},
   month = {May 6 -- 7},
   url = {http://hal3.name/docs/#daume04treeposition}
}


Supervised clustering with the Dirichlet processAbstract     The task of learning to partition data into similar sets occurs frequently in many disciplines. We construct a Bayesian model for learning to partition from labeled data. Our model is based on the nonparametric Dirichlet process prior. Experimental results show that our model is able to outperform existing solutions on real world datasets.
Hal Daumé III, Daniel Marcu
NIPS Workshop on Learning With Structured Outputs (LwSO), 2004@InProceedings{daume04scm,
   author = {Hal {Daum\'e III} and Daniel Marcu},
   title = {Supervised clustering with the Dirichlet process},
   booktitle = {NIPS Workshop on Learning With Structured Outputs (LwSO)},
   year = {2004},
   address = {Whistler, Canada},
   url = {http://hal3.name/docs/#daume04scm}
}


From Zero to Reproducing Kernel Hilbert Spaces in Twelve Pages or Less
Hal Daumé III
Unpublished, 2004@Unpublished{daume04rkhs,
   author = {Hal {Daum\'e III}},
   title = {From Zero to Reproducing Kernel Hilbert Spaces in Twelve Pages or Less},
   note = {Available at \url{http://www.isi.edu/~hdaume/docs/daume04rkhs.ps}},
   month = {February},
   year = {2004}
}


Carefully Approximated Bayes Factors for Feature Selection in MaxEnt Models
Hal Daumé III
Unpublished, 2004@Unpublished{daume04abffs,
   author = {Hal {Daum\'e III}},
   title = {Carefully Approximated {Bayes} Factors for Feature Selection in MaxEnt Models},
   note = {Available at \url{http://www.isi.edu/~hdaume/docs/daume04abffs.ps}},
   month = {November},
   year = {2004}
}


NP Bracketing by Maximum Entropy Tagging and SVM RerankingAbstract     We perform Noun Phrase Bracketing by using a local, maximum entropy-based tagging model, which produces bracketing hypotheses. These hypotheses are subsequently fed into a reranking framework based on support vector machines. We solve the problem of hierarchical structure in our tagging model by modeling underspecified tags, which are fully determined only at decoding time. The tagging model performs comparably to competing approaches and the subsequent reranking increases our system's performance from an f-score of $81.7$ to $86.1$, surpassing the best reported results to date of $83.8$.
Hal Daumé III, Daniel Marcu
Empirical Methods in Natural Language Processing, 2004@InProceedings{daume04bracketing,
   author = {Hal {Daum\'e III} and Daniel Marcu},
   title = {NP Bracketing by Maximum Entropy Tagging and {SVM} Reranking},
   booktitle = {Empirical Methods in Natural Language Processing},
   year = {2004},
   address = {Barcelona, Spain},
   url = {http://hal3.name/docs/#daume04bracketing}
}
   


Web Search Intent Induction via Automatic Query ReformulationAbstract     We present a computationally efficient method for automatic grouping of web search results based on reformulating the original query to alternative queries the user may have intended. The method requires no data other than query logs and the standard inverted indices used by most search engines. Our method outperforms standard web search in the task of enabling users to quickly find relevant documents for informational queries.
Hal Daumé III, Eric Brill
North American Chapter of the Association for Computational Linguistics (NAACL), 2004@InProceedings{daume04intents,
   author = {Hal {Daum\'e III} and Eric Brill},
   title = {Web Search Intent Induction via Automatic Query Reformulation},
   booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)},
   year = {2004},
   address = {Boston, MA},
   url = {http://hal3.name/docs/#daume04intents}
}


Notes on CG and LM-BFGS Optimization of Logistic Regression
Hal Daumé III
Unpublished, 2004@Unpublished{daume04cg-bfgs,
   author = {Hal {Daum\'e III}},
   title = {Notes on {CG} and {LM-BFGS} Optimization of Logistic Regression},
   note = {Paper available at \url{http://hal3.name/docs/#daume04cg-bfgs}, implementation available at \url{http://hal3.name/megam/}},
   month = {August},
   year = {2004}
}


Generic Sentence Fusion is an Ill-Defined Summarization TaskAbstract     We report on a series of human evaluations of the task of sentence fusion. In this task, a human is given two sentences and asked to produce a single coherent sentence that contains only the \emphimportant information from the original two. Thus, this is a highly constrained summarization task. Our investigations show that even at this restricted level, there is no measurable agreement between humans regarding what information should be considered important. We further investigate the ability of separate evaluators to assess summaries, and find similarly disturbing lack of agreement.
Hal Daumé III, Daniel Marcu
Text Summarization Branches Out Workshop at ACL (TextSum), 2004@InProceedings{daume04fusion,
   author = {Hal {Daum\'e III} and Daniel Marcu},
   title = {Generic Sentence Fusion is an Ill-Defined Summarization Task},
   booktitle = {Proceedings of the Text Summarization Branches Out Workshop at ACL (TextSum)},
   year = {2004},
   address = {Barcelona, Spain},
   month = {July 25 -- 26},
   url = {http://hal3.name/docs/#daume04fusion}
}


2002

Yet Another Haskell Tutorial
Hal Daumé III
Unpublished, 2002@Unpublished{daume02yaht,
   author = {Hal {Daum\'e III}},
   title = {Yet Another Haskell Tutorial},
   note = {Available at \url{http://hal3.name/docs/#daume02yaht/}},
   year = {2002}
}


The Importance of Lexicalized Syntax Models for Natural Language Generation TasksAbstract     The parsing community has long recognized the importance of lexicalized models of syntax. By contrast, these models do not appear to have had an impact on the statistical NLG community. To prove their importance in NLG, we show that a lexicalized model of syntax improves the performance of a statistical text compression system, and show results that suggest it would also improve the performances of an MT application and a pure natural language generation system.
Hal Daumé III, Kevin Knight, Irene Langkilde-Geary, Daniel Marcu, Kenji Yamada
2002 International Conference on Natural Language Generation (INLG), 2002@InProceedings{daume02lexicalized,
   author = {Hal {Daum\'e III} and Kevin Knight and Irene {Langkilde-Geary} and Daniel Marcu and Kenji Yamada},
   title = {The Importance of Lexicalized Syntax Models for Natural Language Generation Tasks},
   booktitle = {Proceedings of the 2002 International Conference on Natural Language Generation (INLG)},
   year = {2002},
   address = {Harriman, NY},
   month = {July 1 -- 3},
   pages = {9 - 16},
   url = {http://hal3.name/docs/#daume-lexicalized}
}


A Phrase-Based HMM
Hal Daumé III
Unpublished, 2002@Unpublished{daume02pbhmm,
   author = {Hal {Daum\'e III}},
   title = {A Phrase-Based {HMM}},
   note = {Available at \url{http://www.isi.edu/~hdaume/docs/daume02pbhmm.ps}},
   month = {December},
   year = {2002}
}


GLEANS: A Generator of Logical Extracts and Abstracts for Nice SummariesAbstract     We briefly describe GLEANS, a summarization system that uses four novel techniques for summarizing document collections. (i) GLEANS first maps all documents in a collection into a canonical, database-like representation that makes explicit the main entities and relations in a document collection. (ii) GLEANS also classifies each document collection into one of four categories: collections about a single person, single events, multiple events, and natural disasters. (iii) For each type of document collection, GLEANS also generates from scratch, using predefined templates, the first two sentences in the abstract. (iv) The rest of the summary is then generated by extracting from the database sentences that conform to a set of predefined schemas and by presenting them in an order that reflects coherence constraints specific to each collection category.
Hal Daumé III, Abdesammad Echihabi, Daniel Marcu, Dragos Stefan Munteanu, Radu Soricut
Second Document Understanding Conference (DUC), 2002@InProceedings{daume02gleans,
   author = {Hal {Daum\'e III} and Abdesammad Echihabi and Daniel Marcu and Dragos Stefan Munteanu and Radu Soricut},
   title = {{GLEANS}: A Generator of Logical Extracts and Abstracts for Nice Summaries},
   booktitle = {Proceedings of the Second Document Understanding Conference (DUC)},
   year = {2002},
   address = {Philadelphia, PA},
   month = {July 11 -- 12},
   pages = {9 - 14},
   url = {http://hal3.name/docs/#daume-gleans}
}


A Noisy-Channel Model for Document CompressionAbstract     We present a document compression system that uses a hierarchical noisy-channel model of text production. Our compression system first automatically derives the syntactic structure of each sentence and the overall discourse structure of the text given as input. The system then uses a statistical hierarchical model of text production in order to drop non-important syntactic and discourse constituents so as to generate coherent, grammatical document compressions of arbitrary length. The system outperforms both a baseline and a sentence-based compression system that operates by simplifying sequentially all sentences in a text. Our results support the claim that discourse knowledge plays an important role in document summarization.
Hal Daumé III, Daniel Marcu
40th Annual Meeting of the Association for Computational Linguistics (ACL), 2002@InProceedings{daume02noisy,
   author = {Hal {Daum\'e III} and Daniel Marcu},
   title = {A Noisy-Channel Model for Document Compression},
   booktitle = {Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL)},
   year = {2002},
   month = {July 6 -- 12},
   address = {Philadelphia, PA},
   pages = {449 - 456},
   url = {http://hal3.name/docs/#daume02noisy}
}


2001

Asymmetry of CoordinationAbstract     The standard syntactic analysis of coordination gives equal value to both conjoined elements, and treats both elements equivalently. Nonetheless, in many languages (even English), coordination is much more than simply taking two constituents of the same type (or possibly not) and putting a conjunction between them, yielding a trinary branching node. In this paper I begin with an analysis of coordination in general, present cross-linguistic arguments in its favor, and finally discuss how this structure can account for otherwise unexplained raising data.
Hal Daumé III
Unpublished, 2001@Unpublished{daume01coordination,
   author = {Hal {Daum\'e III}},
   title = {Asymmetry of Coordination},
   note = {Available at \url{http://www.isi.edu/~hdaume/docs/daume01coordination.ps}},
   month = {December},
   year = {2001},
   url = {http://hal3.name/docs/#daume01coordination}
}


Integrated Information Management: An Interactive, Extensible Architecture for Information RetrievalAbstract     Most current IR research is focused on specific technologies, such as filtering, classification, entity extraction, question answering, etc. There is relatively little research on merging multiple technologies into sophisticated applications, due in part to the high cost of integrating independently-developed text processing modules. In this paper, we present the Integrated Information Management (IIM) architecture for component-based development of IR applications. The IIM architecture is general enough to model different types of IR tasks, beyond indexing and retrieval.
Eric Nyberg, Hal Daumé III
2001 Human Language Technology Conference (HLT), 2001@InProceedings{daume01iim,
   author = {Eric Nyberg and Hal {Daum\'e III}},
   title = {Integrated Information Management: An Interactive, Extensible Architecture for Information Retrieval},
   booktitle = {Proceedings of the 2001 Human Language Technology Conference (HLT)},
   year = {2001},
   address = {San Diego, CA},
   month = {March 18 -- 21},
   url = {http://hal3.name/docs/#daume-iim}
}


last updated on twenty two april, two thousand thirteen; contact me AT hal3 DOT name