Yuval Marton
Top | Publications | Talks | Teaching | Academic Activities | Other activities | Bottom
Email: ymarton @t ccls.columbia.edu
I am a post-doctoral researcher at the Columbia University Center for Computational Learning Systems (CCLS). I work with Nizar Habash on syntactic parsing, focusing on Arabic parsing for statistical machine translation (SMT).
I was a linguistics Ph.D. student at UMD, focusing on computational linguistics. My advisors were Philip Resnik and Amy Weinberg. I defended my dissertation in September 2009. My dissertation, entitled “Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models”, explored using soft syntactic and semantic constraints in end-to-end state-of-the-art statistical machine translation systems. It also introduced a novel distributional paraphrase generation technique that can benefit from the soft semantic constraints, and presented a generalized framework of which these soft semantic and syntactic constraints can be viewed as instances, and in which they can be potentially combined.
My research interests include statistical machine translation (SMT), lexical semantics -- corpus-based semantic similarity measures, applied in word-pair similarity ranking and paraphrase generation. More specifically, I am interested in infusing SMT and semantic measures with linguistic knowledge – via incorporating soft syntactic constraints and / or soft semantic constraints into various corpus-based models. I was also involved in text classification research (authorship attribution and topic / genre classification). I was a member of the CLIP Lab at UMIACS, and I also frequent the CNL Lab.
Following my interests in neuro-biologically plausible cognitive and linguistic models, I took several fascinating neuroscience courses at the Neuroscience and Cognitive Science (NACS) Program, and received the NACS Certificate. My qualifying paper focused on visual word recognition. I argued there for a lexical representation that consists of both lower-level visual features and higher-level abstract letter objects, interacting with statistical factors (word frequency) and partly-innate factors (left or right visual field perception). I continue to do research in this area with Carol Whitney.
My previous-previous advisor was Lisa Hellerstein, back when I was a computer science graduate student at the Polytechnic Institute of NYU (formerly Polytechnic University, Brooklyn, NY), where I received my Computer Science Masters.
Top | Publications | Talks | Teaching | Academic Activities | Other activities | Bottom
Yuval Marton, Chris Callison-Burch and Philip Resnik. “Improved Statistical Machine Translation Using Monolingually-derived Paraphrases”. Conference on Empirical Methods in Natural Language Processing (EMNLP). Singapore, August 6-7, 2009. Full paper.
Untranslated words
still constitute a major problem for Statistical Machine Translation (SMT), and
current SMT systems are limited by the quantity of parallel training texts.
Augmenting the training data with paraphrases generated by pivoting through
other languages alleviates this problem, especially for the so-called "low
density" languages. But pivoting requires additional parallel texts. We
address this problem by deriving paraphrases monolingually, using
distributional semantic similarity measures, thus providing access to larger
training resources, such as comparable and unrelated monolingual corpora. We
present what is to our knowledge the first successful integration of a
collocational approach to untranslated words with an end-to-end, state of the
art SMT system demonstrating significant translation improvements in a
low-resource setting.
Yuval Marton, Saif Mohammad and Philip Resnik. “Estimating Semantic Distance Using Soft Semantic Constraints in Knowledge-Source / Corpus Hybrid Models”. Conference on Empirical Methods in Natural Language Processing (EMNLP). Singapore, August 6-7, 2009. Full paper.
We propose a corpus–thesaurus hybrid method that uses soft
constraints to generate word-sense disambiguated distributional profiles (DPs)
from coarser "concept DPs" (derived from a small Roget-like
thesaurus) and sense-unaware traditional word DPs (derived from raw text). Not
relying on a large lexical resource makes this method suitable also for
resource-poorer languages or specific domains. Although it uses a knowledge source,
the method is not vocabulary-limited: if the target word is not in the
thesaurus, the method falls back gracefully on the word’s co-occurrence
information. Experiments on word-pairs ranking by semantic distance show the
new hybrid method to be superior to others.
David Chiang, Yuval Marton and Philip Resnik. “Online Large-Margin Training of Syntactic and Structural Translation Features”. Conference on Empirical Methods in Natural Language Processing (EMNLP 2008). Waikiki, Honolulu, Hawaii, October 25-27, 2008. Full paper.
Minimum-error-rate training (MERT) is a bottleneck for current development in statistical machine translation because it is limited in the number of weights it can reliably optimize. Building on the work of Watanabe et al., we explore the use of the MIRA algorithm of Crammer et al. as an alternative to MERT.
We first show that by parallel processing and exploiting more of the parse forest, we can obtain results using MIRA that match or surpass MERT in terms of both translation quality and computational cost. We then test the method on two classes of features that address deficiencies in the Hiero hierarchical phrase-based model: first, we simultaneously train a large number of Marton and Resnik’s soft syntactic constraints, and, second, we introduce a novel structural distortion model. In both cases we obtain significant improvements in translation performance. Optimizing them in combination, for a total of 56 feature weights, we improve performance by 2.6 BLEU on a subset of the NIST 2006 Arabic-English evaluation data.
Yuval Marton and Philip Resnik. “Soft Syntactic Constraints for Hierarchical Phrased-Based Translation”. The 46th Annual Meeting of the Association for Computational Linguistics (ACL). Columbus, Ohio, June 16-18, 2008. Full paper.
In adding syntax to statistical MT, there is a tradeoff between taking advantage of linguistic analysis, versus allowing the model to exploit linguistically unmotivated mappings learned from parallel training data. A number of previous efforts have tackled this tradeoff by starting with a commitment to linguistically motivated analyses and then finding appropriate ways to soften that commitment. We present an approach that explores the tradeoff from the other direction, starting with a context-free translation model learned directly from aligned parallel text, and then adding soft constituent-level constraints based on parses of the source language. We obtain substantial improvements in performance for translation from Chinese and Arabic to English.
Yuval Marton, Ning Wu, and Lisa Hellerstein. "On Compression-Based Text Classification". Proceedings of the 27th European Conference on Information Retrieval (ECIR), Spain, March 2005. Abstract. Full paper here or here. Click here for the errata note!
Compression-based text classification methods are easy to apply, requiring virtually no preprocessing of the data. Most such methods are character-based, and thus have the potential to automatically capture non-word features of a document, such as punctuation, word-stems, and features spanning more than one word. However, compression-based classification methods have drawbacks (such as slow running time), and not all such methods are equally effective. We present the results of a number of experiments designed to evaluate the effectiveness and behavior of different compression-based text classification methods on English text. Among our experiments are some specifically designed to test whether the ability to capture non-word (including super-word) features causes character-based text compression methods to achieve more accurate classification.
Chris Dyer, Hendra Setiawan, Yuval Marton, and Philip Resnik. “The University of Maryland Statistical Machine Translation System for the Third Workshop on Machine Translation”. EACL 2009 Fourth Workshop On Statistical Machine Translation, March 2009, Athens, Greece. Short paper.
This paper describes the techniques we explored to improve the translation of news text in the German-English and Hungarian-English tracks of the WMT09 shared translation task. Beginning with a convention hierarchical phrase-based system, we found benefits for using word segmentation lattices as input, explicit generation of beginning and end of sentence markers, minimum Bayes risk decoding, and incorporation of a feature scoring the alignment of function words in the hypothesized translation. We also explored the use of monolingual paraphrases to improve coverage, as well as co-training to improve the quality of the segmentation lattices used, but these did not lead to improvements.
Hybrid Knowledge/Corpus-based Semantic Distance Measures and Paraphrase Generation with Soft Constraints. Invited talk, IBM, Hawthorne, NY. July, 2009.
Fine-Grained Soft Semantic Constraints. Invited talk, University of Manchester (by video-conference). June, 2009.
Carol Whitney and Yuval Marton. Perceptual Patterns in Letter-String processing. Conference of the European Society for Cognitive Psychology (ESCOP) XVI, Krakow, Poland, September 2-5, 2009. Accepted.
The SERIOL model (Whitney, 2001) proposes that left-to-right lateral inhibition within retinotopic RH areas is a learned, string-specific mechanism necessary for encoding letter order (for a language read from left to right). We review the theory behind this claim and present supporting data from a trigram identification experiment, which demonstrates the predicted differential effect of within-string position across visual fields. We contrast the SERIOL account with a recent alternative model of perceptual patterns specific to string processing (Tydgat & Grainger, in press).
Yuval Marton. “Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models”. Ph.D. Dissertation, Department of Linguistics, University of Maryland, October 2009. Official format or paper-saving single-space format.
This dissertation focuses on effective combination of data-driven natural language processing (NLP) approaches with linguistic knowledge sources that are based on manual text annotation or word grouping according to semantic commonalities. I gainfully apply fine-grained linguistic soft constraints – of syntactic or semantic nature – on statistical NLP models, evaluated in end-to-end state-of-the-art statistical machine translation (SMT) systems. The introduction of semantic soft constraints involves intrinsic evaluation on word-pair similarity ranking tasks, extension from words to phrases, application in a novel distributional paraphrase generation technique, and an introduction of a generalized framework of which these soft semantic and syntactic constraints can be viewed as instances, and in which they can be potentially combined.
Fine granularity is key in the successful combination of these soft constraints, in many cases. I show how to softly constrain SMT models by adding fine-grained weighted features, each preferring translation of only a specific syntactic constituent. Previous attempts using coarse-grained features yielded negative results. I also show how to softly constrain corpus-based semantic models of words (“distributional profiles”) to effectively create word-sense-aware models, by using semantic word grouping information found in a manually compiled thesaurus. Previous attempts, using hard constraints and resulting in aggregated, coarse-grained models, yielded lower gains.
A novel paraphrase generation technique incorporating these soft semantic constraints is then also evaluated in a SMT system. This paraphrasing technique is based on the Distributional Hypothesis. The main advantage of this novel technique over current “pivoting” techniques for paraphrasing is the independence from parallel texts, which are a limited resource. The evaluation is done by augmenting translation models with paraphrase-based translation rules, where fine-grained scoring of paraphrase-based rules yields significantly higher gains.
The model augmentation includes a novel semantic reinforcement component: In many cases there are alternative paths of generating a paraphrase-based translation rule. Each of these paths reinforces a dedicated score for the “goodness” of the new translation rule. This augmented score is then used as a soft constraint, in a weighted log-linear feature, letting the translation model learn how much to “trust” the paraphrase-based translation rules.
The work reported here is the first to use distributional semantic similarity measures to improve performance of an end-to-end phrase-based SMT system. The unified framework for statistical NLP models with soft linguistic constraints enables, in principle, the combination of both semantic and syntactic constraints – and potentially other constraints, too – in a single SMT model.
Yuval Marton. “What Can we Learn about Language Processing and Representation from Word Contour Effects on Letter Order Perception and Word Recognition in Right and Left Visual Fields?” Qualifying paper (Ling895), Department of Linguistics, University of Maryland, May 2007. Manuscript.
Top | Publications | Talks | Teaching | Academic Activities | Other activities | Bottom
TA for Computational Linguistics II (Ling647 / CMSC828R), taught by Philip Resnik, Spring 2006.
TA for Introductory Linguistics (Ling200), taught by Tonia Bleam, Spring 2008, TA.
Top | Publications | Talks | Teaching | Academic Activities | Other activities | Bottom
I took part (or still am) in the following:
NACS Program (The Program in Neuroscience and Cognitive Science at the University of Maryland): I received the NACS Program Certificate in August 2008.
Reviewer: ACM TALIP Journal 2006. (Association for Computing Machinery: Transactions on Asian Language Information Processing)
Psycholinguistic experiments at the CNL Lab.
The Machine Translation MURI project, Spring 2006 – Present.
Colloquium Committee, member, Fall 2005 – Spring 2006.
Semantics Search Committee, member, Fall 2005 – Spring 2006.
Top | Publications | Talks | Teaching | Academic Activities | Other activities | Bottom
Human translation: Example.
GSG (Graduate Student Government) Rep, Linguistics, Fall 2005 – Spring 2006.
GSG Student Affairs Committee, member, Fall 2005 – Spring 2006.
Grammar Society group, President, Fall 2005 – Spring 2006.
LGSA (the Linguistics Graduate Students Association) Rep, Spring 2005 – Fall 2005.
Top | Publications | Talks | Teaching | Academic Activities | Other activities | Bottom
Under construction! (Permanently)
Top | Publications | Talks | Teaching | Academic Activities | Other activities | Bottom