Advanced Seminar in Computational Linguistics: Computational Social Science


Philip Resnik


Linguistics conference room, 1401 Marie Mount Hall, Wednesdays 2-4:30pm.

    Note: The content of 800-level seminars varies from year to year, and this seminar will not cover the same material as previous offerings of Ling848 (Seminar in Computational Linguistics), although it does overlap with my Fall 2009 seminar. If you have questions about the seminar, including any uncertainty about whether your background is appropriate in order to attend, please contact me.

    Back in 2009, Mark Liberman argued on Language Log that "corpus based social science" was poised to go mainstream, despite a general historical tendency toward "linguistic anemia" in the social sciences. As computational linguists, we have known for the past several decades that language use in corpora can serve as a useful proxy for world knowledge. With so much of people's lives going online, it was time to start imagining technology that would exploit large-scale language use as evidence for people's individual properties, behaviors, and social interactions, as well.

    Since then, there's no doubt that Mark was right. A September 9, 2013 Time magazine article entitled "What Twitter Says to Linguists" mentions that "upwards of 150 Twitter-based studies have come out in 2013 so far", and has a nice quote from Jacob Eisenstein, whose work is featured in the article: "Language is really a window into people's sense of personal identity".

    Now, "social science" is an unmanageably huge topic. The Wikipedia page on Social Science includes disciplines ranging from archaeology to social work. In this seminar, we'll narrow the field somewhat. I'm particularly interested in the idea of perspective, in the Oxford Dictionary sense of "a particular attitude toward or way of regarding something; a point of view". This seems to me to be a focal issue in the social sciences: our perspectives help to define who we are, and they influence the decisions we make and the nature of the other individuals and groups with which we consider ourselves associated. Within computational linguistics, sentiment analysis is a familiar subset of perspective emphasizing the way that language communicates positive, neutral, and negative polarity, but there are many other ways of looking at perspective that are to my mind equally (and more!) interesting.

    With that in mind I'm going to organize the content primarily, although perhaps not exclusively, around two main themes.

    This seminar will mainly involve readings and in-class discussion, helped along by participation in discussions on Piazza. The class will be graded on participation (30%), which includes leading class discussions, as well as a term paper/project (70%). I hope to encourage hands-on projects that involve real problems, aiming for papers suitable for submission to appropriate conferences.

