Holistic sentiment analysis across languages: multilingual supervised latent Dirichlet allocation

TitleHolistic sentiment analysis across languages: multilingual supervised latent Dirichlet allocation
Publication TypeConference Papers
Year of Publication2010
AuthorsBoyd-Graber J, Resnik P
Conference NameProceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Date Published2010///
PublisherAssociation for Computational Linguistics
Conference LocationStroudsburg, PA, USA
Abstract

In this paper, we develop multilingual supervised latent Dirichlet allocation (MlSLDA), a probabilistic generative model that allows insights gleaned from one language's data to inform how the model captures properties of other languages. MlSLDA accomplishes this by jointly modeling two aspects of text: how multilingual concepts are clustered into thematically coherent topics and how topics associated with text connect to an observed regression variable (such as ratings on a sentiment scale). Concepts are represented in a general hierarchical framework that is flexible enough to express semantic ontologies, dictionaries, clustering constraints, and, as a special, degenerate case, conventional topic models. Both the topics and the regression are discovered via posterior inference from corpora. We show MlSLDA can build topics that are consistent across languages, discover sensible bilingual lexical correspondences, and leverage multilingual corpora to better predict sentiment.

URLhttp://dl.acm.org/citation.cfm?id=1870658.1870663