Readings


This is the schedule for Advanced Seminar in Computational Linguistics: Computational Social Science, Fall 2014.

THIS SCHEDULE IS A WORK IN PROGRESS!
In addition, some topic areas may take longer than expected, so keep an eye on the class mailing list or e-mail me for "official" dates.

Students are expected to:


Sep 3.
Sep 10.

Representations of Meaning

Logical Representations

For someone taking a computational angle on logical form, the Hobbs paper does a very nice job introducing key concepts, including, among other things, neo-Davidsonian event-centered representations, de re and de dicto belief reports, and intensional contexts, all with an eye toward "real sentences in English discourse". (For a highly accessible, albeit perhaps a little dated, introduction to Montague-style model theoretic semantics for a general audience, I recommend Emmon Bach's Informal Lectures on Formal Semantics.) In their recent short paper, Rudinger and Van Durme use Hobbs' representation to provide a qualitative discussion of the dependency representation scheme produced by the widely used Stanford Parser.

In terms of actually building up logical representations, combinatory categorial grammar (CCG) provides a very nice example of a direct correspondence between syntactic and semantic structure. Steedman's short introduction is enough to get the basics of how CCG works, Bos et al. utilize the lambda calculus in building (again neo-Davidsonian) semantic representations from a large-scale CCG parser, and the paper by Zettlemoyer and Collins describe probabilistic CCGs and show how to learn a probabilistic CCG from a training set of sentences labeled with expressions in the lambda calculus.

Steedman and Baldridge's Combinatory Categorial Grammar provides a more in-depth summary of CCG, and for a different take on rule-to-rule correspondence see Shieber and Schabes Synchronous Tree-Adjoining Grammars. Incidentally, for those interested in speech, Steedman has argued for a similar take on intonational structure; see, e.g., this recent paper.


Sep 17.

Semantics in Machine Translation

One would think that machine translation --- the task of taking a meaning in one language and conveying it in another -- would be among the most natural places to find work on semantics in computational linguistics. Indeed, semantic transfer rules have been part of MT for a long time. The idea of interlingual machine translation goes a step further, being based on the idea of translating from the source into a language-independent meaning representation, and then out to the target language.

We'll start with Bonnie Dorr's work, which is probably the best known attempt at a truly interlingual approach to MT; it also provides a nice introduction to a number of core ideas in lexical semantics.

Work along these lines, and in fact the idea of explicit semantic representations, was largely abandoned in MT as a part of the statistical revolution in NLP in the 1990s, and the fundamental idea that replaced it was the idea of decoding, i.e. search through the space of possible outputs guided by some optimization criterion. Although early statistical MT (notably the IBM models) aimed at optimizing likelihood, MT was revolutionized yet again by the introduction of optimization using shallow but fully automatic meaning-similarity metrics, starting with BLEU and exploding into a cottage industry including, among many others, METEOR, TER, and TERp.

More recently, as the pendulum swings back, semantics has been making a resurgence in the definition of evaluation/optimization criteria, and things are heading back in the direction of it finding its way back into MT systems, as well. We'll look at recent work on semantically informed measures of meaning-equivalence in MT, and at the recent movement in the direction of Abstract Meaning Representation (AMR). (Bridging the two, optionally look at smatch, a metric for AMR similarity; also see Flanigan et al. in connection with last week's discussion of semantic parsing.)

Semantic representations

Measuring meaning equivalence
Sep 24.

Network Representations

Network or graph representations have long been used to represent semantic and conceptual knowledge. Early work on "semantic networks" laid the foundation for representations in which concepts appear as nodes, and relations among concepts appear as links; optional background readings here include classic work by Collins and Quillian that argued for the role of semantic networks as a model for human memory. The "read and report" assignment on Rosch's prototype theory concerns another extremely influential view conceptual categories and how they are taxonomically organized, including both Rosch's seminal paper and an interesting and very recent computational application of its key ideas.

The classic paper by Bill Woods provides important context (and cautions) when it comes to how we think about semantic network representations. WordNet, despite originally having been conceived as more of a tool for human use, has become central as a resource for computational modeling.

When it comes to real-world use of ontologies, the medical domain is one of the most successful, in part because of a medical tradition of taxonomizing knowledge that goes back to Linneus, and in part because informaticians in medicine have actually devoted enormous effort to creating knowledge resources with sufficient coverage to actually be useful. In class this week we will have two guests, Tony Davis and Andy Wetta of 3M Health Information Systems, who will be talking with us about applications of ontologies in the healthcare domain. They have pointed to the reading about the Foundational Model of Anatomy (FMA) as useful background for what they'll cover, and as optional reading they also recommend (a) the Framenet book (first 25 pages or so), (b) the Copestake et al. introduction to minimal recursion semantics, (c) the Wikipedia article on SNOMED-CT, and (d) Rector, Brandt, and Schneider, Getting the foot out of the pelvis, a critique of some of SNOMED-CT's decisions (see also other papers by Barry Smith et al. about SNOMED).

Classic older background on network representations (optional):

Readings:

Read-and-report: Prototypes and Categories


Oct 1.

Grounding Meaning

The essence of semantics is characterizing a relationship between forms and meanings -- as the Wikipedia entry on Semantics puts it, "the relation between signifiers, like words, phrases, signs, and symbols, and what they stand for, their denotation." But if denotations are themselves characterized using other symbols, e.g. a formal logic, or entries in a dictionary, how can meaning ultimately be "grounded out"? Recall that last week we discussed how Miller, in his Introduction to WordNet, explicitly disavows the idea of attempting to create a definitional grounding for WordNet's semantics, instead explicitly characterizing its semantics as "differential" in the sense of distinguishing among concepts that its users are already assumed to possess. Meaning in WordNet is therefore intended to be parasitic, so to speak, depending on concepts we already have.

In this next section of the course, we look at different potential ways of "grounding" meaning in computational linguistics. We begin with Harnad's classic paper on the symbol grounding problem, to set context. This week we also read a foundational paper on Latent Semantic Analysis (LSA), which has for some time was a cornerstone of work on distributed representations and dimensionality reduction for text, although Latent Dirichlet Allocation (LDA) has overtaken it in popularity in many text analysis settings. (We do not explicitly cover LDA in this course because it is introduced in our Computational Linguistics series (e.g. see the December 2 lecture on Topic Models in Computational Linguistics I.)

Although the term "LSA" is usually associated specifically with a particular form of dimensionality optimization using singular value decomposition (SVD), and although one usually thinks of it largely as a technical method originated in information retrieval (hence "latent semantic indexing", or LSI), the 1997 Psychological Review article by Landauer and Dumais sets LSA in a far broader context. Interestingly, Landauer and Dumais's discussion anticipates recent developments in representation learning using deep and/or recursive neural networks, as, for example, discussed in the paper by Mikolov.

Background

"Grounding" meaning in text: distributed representations

Word/Document Semantics


Oct 8.

Compositional Semantics

In the previous session we discussed the symbol grounding problem, and we looked at the argument by Landauer and Dumais that dimensionality reduction via LSA can provide an alternative approach to "grounding" symbols, one in which vector space word representations emerge through a holistic analysis of co-occurrence relationships rather than necessarily connecting back to sensory stimuli.

The essence of the LSA proposal is to learn representations through a large-scale, inductive process, an idea that they point out is related to earlier neural network approaches, and which has seen a renaissance over the last several years in representation learning using deep neural networks.

In this session we will look at representation learning, beginning with a quick look at Bengio's nice overview and then Mikolov et al.'s analysis of how recursive neural network (RNN) distributed word representations capture syntactic and semantic regularities. We'll then look at some examples of other work that attempts to go a step further by inducing representations of word meaning that compose to produce larger-grained representations of phrasal meaning.


Oct 15.

"Grounding" meaning in text: machine reading

In the previous class or two, we looked at the idea of "grounding" text in text itself -- the corpus, essentially, as a proxy for the world. Up to this point, however, we have looked at distributed representations derived from text in a way that is for the most part divorced from higher level questions of reasoning. In contrast "machine reading" attempts to extract knowledge from text, with an interest not just in representations but in the ability to reason with those representations.

The overall goals for this process have existed for decades --- for example, the early work of Bill Woods (see class September 24), and the semantic network representations he discussed, were largely motivated by the desire to have computers acquire knowledge by reading, inspired by what people do, and then reason about it, answer questions, etc. (Hence a lot of the discussion we saw, and had, about representing factual assertions about instances like Mary's dog (OWNS(p,d) AND NAME-OF(d,"Fido")) rather than general facts about categories (IS-A(DOG,MAMMAL)). However, it is largely in the last 5-10 years that we have seen real critical mass for machine reading approaches.

Today's trio of papers looks at this pursuit from three angles: IBM's Watson (DeepQA), which adopted the interesting strategy of stimulating progress by working on a very public, very clearly defined task with only moderate domain-level constraints; Etzioni and colleagues' "open information extraction" approach, which minimizes prior domain knowledge requirements and focuses on quantity and correctness of learned facts rather than a specific task; and DARPA's Machine Reading program, where the goal is not to capture all the knowledge in a corpus but to focus on reasoning tasks where knowledge in unstructured text is one essential piece of the puzzle, encouraging adaptability via a series of reading tasks that progress from one to domain to the next.

Some other things on this topic that are worth looking at include:
Oct 22.

Grounding meaning in restricted worlds

One could argue that in the previous class or two discussing grounding, we weren't really talking about grounding -- after all, the work we discussed was about creating representations of meaning from text, not connecting text to meaning in the sense of actual things in the real world.

This week we start discussing approaches that more explicitly connect language to the world. We begin with approaches that start small, in the sense of working with highly restricted subsets of the world, rather than the world at large. This idea dates back to the very early days of artificial intelligence, when Minsky and Papert suggested that AI research should start with artificial "micro-worlds", of which perhaps the most famous example is Winograd's blocks world.

In its modern incarnation, work along these lines falls under the general heading of "grounded language learning" (see a nice overview talk on grounded language learning by Ray Mooney at AAAI 2013). Some examples of restricted domains where this has been explored include simulated robotic soccer games (see a fun video for the non-simulated version), simple visual worlds (much like blocks world), mathematical proofs, and weather forecasts. Today's class will be done in lecture mode without expecting detailed reading-ahead, but here are primary references I'm drawing on, in addition to the nice overview I mentioned in Ray Mooney's AAAI-2013 keynote:


Oct 29.

Grounding meaning in non- (or at least less-)restricted electronic worlds

Last class was about using controlled or restricted worlds as a sandbox for exploring computational models connecting language with the world. But, although much of that work does develop foundational ideas and methods, who wants to spend all their time in a sandbox?

Recently there has been quite a bit of very interesting work exploring the grounding of language in less restricted settings, using what I think of as an expanded notion of "real world". If you think about it, we're spending so much of our time online that the lines between the virtual universe and the physical universe are getting blurry. We now have unprecedented access to the "digital traces" of individual behavior (see Lazer et al. 2009 for nice discussion), including, for example, records of our personal thoughts and feelings, our social interactions, and even our locations. At the same time, "world knowledge" -- always the Achilles heel of artificial intelligence -- is now being aggregated online in semi-structured forms like Wikipedia and more structured forms like Freebase.

Today we will start with the latter category, work that can be seen as grounding language in online information that does not come from a "micro" version of the world.

Read-and-report 3: The Semantic Web


Nov 5.

Grounding in visual input

Last class, we talked about grounding language in the electronic world of Web services and online knowledge, and we took a step toward the physical world by looking at connections between language and geographical location in the form of electronic geolocation information. Today we move more firmly to grounding of language in the real, physical world by looking at work on grounding language in visual input.

The short paper by Deb Roy in Trends in Cognitive Sciences provides a bit of context-setting. Then we move to some recent deep learning work on computational semantics connecting language with images. Finally, we'll talk about a very interesting then-and-now pair of papers: one by Jeff Siskind in 1990 and another by Siskind and his student Haonan Yu, showing where his thinking and work have gotten to more than two decades later. (The latter won an ACL best paper award.)


Nov 12.

Modeling human acquisition of semantics

The heading here is, of course, impossibly broad. But today we'll look at some work related to the most amazing of computational devices for learning meaning: the human child. Siskind's 1990 paper from last week fits into that category: he was expressly interested in computational modeling that would make "only linguistically and cognitively plausible assumptions".

Today we begin with Deb Roy -- his TED talk provides a nice, high-level overview of some remarkable work he did on instrumentation for gathering relevant data, and it also articulates the connection between that work and practical applications, which led to the creation of Bluefin Labs (acquired in February 2013 by Twitter). We'll also discuss his Cogsci 2012 paper, as an example of the kind of analysis he's done with the collected data. (See his 2002 article in Computer Speech and Language, "Learning visually grounded words and syntax for a scene description task", for representative earlier work.)

We then move to a different aspect of acquisition, the ability to connect words in the input to their semantic roles. This problem is intimately connected to the problem of verb learning --- that is, identifying the particular verb concept to which a word refers --- and that question is itself closely tied to the question of how verb semantics is represented. (If that makes you think about earlier readings we did from Bonnie Dorr and Jeff Siskind, that's good.) Those not already familiar with work in that area might want to consider reading over the short (6 page) overview of syntactic bootstrapping by Fisher et al. 2010, and if you had the time I would also recommend the Màrquez et al. overview of semantic role labeling

Because Connor et al. and Perfors et al. are long, I don't think we can squeeze in another reading for today. If we were going to, however, the ones I'd recommend would be:


Nov 19.

Continuation of previous class.


Nov 26.

No class -- have a great Thanksgiving!


Dec 3.

Beyond Denotations: Pragmatic inferences in language understanding

This semester we have been looking primarily at the relationship between language and the underlying meaning expressed by that language. Today we look more broadly at "meaning", going beyond literal meanings or denotations to things that are "meant" in a broader sense, even if the meaning is not fully carried by the utterance itself. Hobbs's theory of interpretation as abduction is a classic computational approach that talks about the construction of meaning as a process of reasoning (hence including world knowledge) to obtain the best explanation for an utterance. We also look at implicature, with one reading providing broad background, and then examining in particular the treatment of opinion implicatures proposed in work in progress by Wiebe and Deng.

Although we probably won't have time to discuss it in this class, if you're interested in pragmatic inferences you should look at the problem of recognizing textual entailment. Here is a comprehensive treatment of the topic (full book).


Dec 10.


Philip Resnik, Professor
Department of Linguistics and Institute for Advanced Computer Studies


Department of Linguistics
1401 Marie Mount Hall            UMIACS phone: (301) 405-6760       
University of Maryland           Linguistics phone: (301) 405-8903
College Park, MD 20742 USA	   Fax: (301) 314-2644 / (301) 405-7104
http://umiacs.umd.edu/~resnik	   E-mail: resnik AT umd _DOT.GOES.HERE_ edu