"Learning from Speech Production for Improved Recognition"

Fri Jan 31, 2014 11:00 AM

Location: A.V. Williams, Room 2120

Karen Livescu
Toyota Technological Institute
University of Chicago

Ideas from speech production research have motivated several lines of work in the speech recognition research community. Unfortunately, our understanding of speech articulation is still quite limited, and articulatory measurement data are scarce. How can we take advantage of the potential usefulness of speech production without relying too much on noisy information?

This talk will cover recent work exploring this area, with the theme of using machine learning ideas to automatically infer information where our knowledge and data are lacking. The talk will describe new techniques for deriving improved acoustic features using articulatory data in a multi-view learning setting. The techniques here are based on canonical correlation analysis and its nonlinear extensions, including our recently introduced extension using deep neural networks. The talk will also cover recent work using no articulatory data at all and instead treats articulatory information as hidden variables in models for lexical access and spoken term detection.