Hal Daumé III

I am Hal Daumé III, an Assistant Professor in Computer Science (also UMIACS and Linguistics) at the University of Maryland; I was previously in the School of Computing at the University of Utah (CV). Although I'd like to be known for my research in language (computational linguistics and natural language processing) and machine learning (structured prediction, domain adapation and Bayesian methods), I am probably best known for my NLPers blog. I associate myself most with conferences like ACL, ICML, EMNLP and NIPS.

The best way to reach me is by email at me AT hal3 DOT name, I cannot reply to all emails from prospective students; please read this to ensure that I read your email. For pressing matters, please come visit me in person at AVW 3227, or call my office at 301-405-1073.


Recent publications:

  • Computational methods are invaluable for typology, but the models must match the questions: Commentary on Dunn et al. (2011) [RL,HD] (2011)@article{daume11dunn,
       author = {Roger Levy and Hal {Daum\'e III}},
       title = {Computational methods are invaluable for typology, but the models must match the questions: Commentary on Dunn et al. (2011)},
       journal = {Linguistic Typology},
       year = {2011},
       url = {http://hal3.name/docs/#daume11dunn}
    }
  • Corpus-Guided Sentence Generation of Natural ImagesAbstract     We propose a sentence generation strategy that describes images by predicting the most likely nouns, verbs, scenes and prepositions that make up the core sentence structure. The input are initial noisy estimates of the objects and scenes detected in the image using state of the art trained detectors. As predicting actions from still images directly is unreliable, we use a language model trained from the English Gigaword corpus to obtain their estimates; together with probabilities of co-located nouns, scenes and prepositions. We use these estimates as parameters on a HMM that models the sentence generation process, with hidden nodes as sentence components and image detections as the emissions. Experimental results show that our strategy of combining vision and language produces readable and descriptive sentences compared to naive strategies that use vision alone. [YY,CT,HD,YA] (EMNLP 2011)@InProceedings{daume11generation,
       author = {Yezhou Yang and Ching Lik Teo and Hal {Daum\'e III} and Yiannis Aloimonos},
       title = {Corpus-Guided Sentence Generation of Natural Images},
       booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
       year = {2011},
       address = {Edinburgh, Scotland},
    }
  • Beam Search based MAP Estimates for the Indian Buffet Process [PR,HD] (ICML 2011)@InProceedings{daume11ibpsearch,
       author = {Piyush Rai and Hal {Daum\'e III}},
       title = {Beam Search based MAP Estimates for the Indian Buffet Process},
       booktitle = {International Conference on Machine Learning (ICML)},
       year = {2011},
       address = {Bellevue, WA},
       url = {http://hal3.name/docs/#daume11ibpsearch}
    }
  • A Co-training Approach for Multiview Spectral Clustering [AK,HD] (ICML 2011)@InProceedings{daume11cospec,
       author = {Abhishek Kumar and Hal {Daum\'e III}},
       title = {A Co-training Approach for Multiview Spectral Clustering},
       booktitle = {International Conference on Machine Learning (ICML)},
       year = {2011},
       address = {Bellevue, WA},
       url = {http://hal3.name/docs/#daume11cospec}
    }
  • Improving Bilingual Projections via Sparse Covariance Matrices [JJ,RU,HD,AB] (EMNLP 2011)@InProceedings{daume11sparse,
       author = {Jagadeesh Jagarlamudi and Raghavendra Udupa and Hal {Daum\'e III} and Abhijit Bhole},
       title = {Improving Bilingual Projections via Sparse Covariance Matrices},
       booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
       year = {2011},
       address = {Edinburgh, Scotland},
    }
  • Approximate Scalable Bounded Space Sketch for Large Data NLP [AG,HD] (EMNLP 2011)@InProceedings{daume11sketch,
       author = {Amit Goyal and Hal {Daum\'e III}},
       title = {Approximate Scalable Bounded Space Sketch for Large Data {NLP}},
       booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
       year = {2011},
       address = {Edinburgh, Scotland},
       url = {http://hal3.name/docs/#daume11sketch}
    }
  • Lossy Conservative Update (LCU) sketch: Succinct approximate count storage [AG,HD] (AAAI 2011)@InProceedings{daume11lcu,
       author = {Amit Goyal and Hal {Daum\'e III}},
       title = {Lossy Conservative Update ({LCU}) sketch: Succinct approximate count storage},
       booktitle = {Conference on Artificial Intelligence (AAAI)},
       year = {2011},
       address = {Portland, OR},
       url = {http://hal3.name/docs/#daume11lcu}
    }
  • Online Learning of Multiple Tasks and Their RelationshipsAbstract     We propose an Online MultiTask Learning (OMTL) framework which simultaneously learns the task weight vectors as well as the task relatedness adaptively from the data. Our work is in contrast with prior work on online multitask learning which assumes fixed task relatedness, a priori. Furthermore, whereas prior work in such settings assume only positively correlated tasks, our framework can capture negative correlations as well. Our proposed framework learns the task relationship matrix by framing the objective function as a Bregman divergence minimization problem for positive definite matrices. Subsequently, we exploit this adaptively learned task-relationship matrix to select the most informative samples in an online multitask active learning setting. Experimental results on a number of real-world datasets and comparisons with numerous baselines establish the efficacy of our proposed approach. [AS,PR,HD,SV] (AI-Stats 2011)@InProceedings{daume11olmt,
       author = {Avishek Saha and Piyush Rai and Hal {Daum\'e III} and Suresh Venkatasubramanian},
       title = {Online Learning of Multiple Tasks and Their Relationships},
       booktitle = {Conference on Artificial Intelligence and Statistics (AI-Stats)},
       year = {2011},
       address = {Ft. Lauderdale, FL},
       url = {http://hal3.name/docs/#daume11olmt}
    }
  • Domain Adaptation for Machine Translation by Mining Unseen WordsAbstract     We show that unseen words account for a large part of the translation error when moving to new domains. Using an extension of a recent approach to mining translations from comparable corpora (Haghighi et al., 2008), we are able to find translations for otherwise OOV terms. We show several approaches to integrating such translations into a phrasebased translation system, yielding consistent improvements in translations quality (between 0.5 and 1.5 Bleu points) on four domains and two language pairs. [HD,JJ] (2011)@InProceedings{daume11lexicaladapt,
       author = {Hal {Daum\'e III} and Jagadeesh Jagarlamudi},
       title = {Domain Adaptation for Machine Translation by Mining Unseen Words},
       booktitle = {Association for Computational Linguistics},
       year = {2011},
       address = {Portland, OR},
       url = {http://hal3.name/docs/#daume11lexicaladapt}
    }
  • From Bilingual Dictionaries to Interlingual Document RepresentationsAbstract     Mapping documents into an interlingual representation can help bridge the language barrier of a cross-lingual corpus. Previous approaches use aligned documents as training data to learn an interlingual representation, making them sensitive to the domain of the training data. In this paper, we learn an interlingual representation in an unsupervised manner using only a bilingual dictionary. We first use the bilingual dictionary to find candidate document alignments and then use them to find an interlingual representation. Since the candidate alignments are noisy, we develop a robust learning algorithm to learn the interlingual representation. We show that bilingual dictionaries generalize to different domains better: our approach gives better performance than either a word by word translation method or Canonical Correlation Analysis (CCA) trained on a different domain. [JJ,HD,RU] (2011)@InProceedings{daume11interlingual,
       author = {Jagadeesh Jagarlamudi and Hal {Daum\'e III} and Raghavendra Udupa},
       title = {From Bilingual Dictionaries to Interlingual Document Representations},
       booktitle = {Association for Computational Linguistics},
       year = {2011},
       address = {Portland, OR},
       url = {http://hal3.name/docs/#daume11interlingual}
    }

Recent teaching:


Students:

Prospective students:
  • Read this and email me after taking machine learning and/or NLP about potential research.

Current students:

Past students:

  • Adam Teichert (MS 2009 at Utah, now PhD student at JHU)
  • Scott Alfeld (BS 2008 at Utah, now PhD student at USC)

Upcoming Conferences

(bold = plan to attend):

LocationDue DateNotificationConference Dates
AISTATS 11Ft. Lauderdale, FLPast28 Feb11-13 Apr
ACL 11Portland, ORPast11 Feb20-24 Jun
CoNLL 11Portland, OR04 Mar11 Apr23-24 Jun
ICML 11Bellevue, WA01 Feb19 Apr28 Jun-01 Jul
UAI 11Barcelona, Spain18 Mar01 Jun14-17 Jul
IJCAI 11Barcelona, SpainPast31 Mar16-22 Jul
EMNLP 11Edinburgh, Scotland23 Mar24 May27-31 Jul
AAAI 11San Francisco, CA08 Feb15 Apr07-11 Aug
KDD 11San Diego, CA18 Feb29 Apr21-24 Aug
NIPS 11Granada, Spain01 Jun??12-17 Dec

last updated on twenty four january, two thousand twelve; contact me AT hal3 DOT name