UMIACS Computational Linguistics Colloquium, February 6, 2002

Statistical Approaches for Speech-to-Speech Translation


Yuqing Gao


Lead, Speech-to-Speech Translation Research
IBM T. J. Watson Research Center


UMIACS Computational Linguistics Colloquium

Special time and room:
February 6, 2002,
2:30-3:30pm, AVW Room 3258


Construction of high performance speech-to-speech systems is clearly extremely complex, involving research in Automatic Speech Recognition (ASR), Text-to-Speech (TTS), Machine Translation (MT), and Natural Language Understanding (NLU) and Generation (NLG). Although substantial progress in each of these components individually has been made over the last two decades, blindly integrating ASR, NLU, MT, NLG, and TTS components to produce S2S systems does not produce acceptable results. For example, conventional text-based MT systems have not been designed to cope with the imperfect syntax and transcription errors which characterize automatically transcribed conversational speech. Traditional speech recognizer (ASR component) and speech synthesizer (TTS component) have not been designed to recognize or synthesize speakers' emotional expressions which convey meanings and play important rule in the communications between human beings. On the other hand, traditional speech recognition has almost reached a saturation point of performance because it treats speech and language as separate symbolic signals and does not take the semantics of the underlying utterance into account during decoding.

Three objectives have been set for our project. One is to investigate new approaches for translation which can handle the speech recognition errors and imperfrect syntax. The other is to explore new algorithms to improve speech recognition by utilizing language structures and semantics of the utterance. The third goal is to explore new algorithms which take speech recognition and translation into account as an entirety, rather than separate them into individual components.

I will talk our methods and algorithms as well as our research progress in each of these areas:

About the speaker: Yuqing Gao received her Ph.D. in Electrical Engineering from Southeastern University, Nanjing, China in 1989. From 1988 to 1989 she was a visiting scholar at National Laboratory of Pattern Recognition, Chinese Academy of Sciences, Beijing. From June 1989 until Feb. 1992, she was a research staff member of National Laboratory of Pattern Recognition as an associate professor (January 1991 - Feb. 1992) and an assistant professor (June 1989 - January 1991), where she was the project leader and the chief scientist for several projects funded by NSF of China. From March 1992 until May 1993, she was a researcher at CRIN (Center de Recherché en Informatique de Nancy), Nancy, France. From August 1993 to Nov 1995, she was a research staff member and project manager for speech recognition research at Apple-ISS Research Center, Apple Computer Inc., Singapore. Since Dec. 1995 until now, she has been a research staff member at IBM T. J. Watson Research Center, where she has been project leader for large vocabulary continuous speech dictation system and speech-to-speech translation research. Dr. Yuqing Gao has published over 60 papers at various conferences and journals, contributed to 4 books. She holds 10 US patents and has been principle investigator for several DARPA and NSF funded research projects.


For the colloquium series schedule, see the UMD Computational Linguistics Colloquium Series web page at http://umiacs.umd.edu/~resnik/cl_colloquium/. If you are interested in meeting with the speaker, please contact Philip Resnik (resnik@umiacs.umd.edu).