Teaching AI to Listen: UMD Researcher Models How People Learn New Languages | University of Maryland Institute for Advanced Computer Studies

When learning a new language, one of the biggest challenges is training the ear to recognize unfamiliar sounds. A subtle shift in a vowel or an unfamiliar consonant blend can completely change a word's meaning, yet those distinctions can be difficult for non-native speakers to perceive.

University of Maryland doctoral student Annika Shankwitz is working to understand why. Using machine learning and computational modeling, she is developing models that predict how people perceive non-native speech sounds, helping researchers better understand the relationship between artificial intelligence and human cognition.

“Broadly, I'm interested in how listeners perceive non-native speech sounds,” Shankwitz said. "I want to use machine learning and computational modeling to try and mimic human perceptual patterns that we know about, and by doing that, figure out what computational methods can capture and what they can't."

Earlier this year, Shankwitz received a National Science Foundation (NSF) Graduate Research Fellowship, one of the nation's most prestigious awards for graduate students. The fellowship provides five years of funding for her research exploring computational optimal transport—a mathematical framework for finding the most efficient way to compare complex data—as a model for predicting second-language speech perception.

Shankwitz was one of only six linguistics students nationwide to receive the fellowship this year.

While her NSF-funded research focuses on optimal transport, another branch of her work uses small neural networks to explore how machine learning models speech perception. She will present those findings this month at the Society for Computation in Linguistics (SCiL) workshop, held alongside the Annual Meeting of the Association for Computational Linguistics (ACL 2026) in San Diego.

The partnership between SCiL and ACL was co-organized by Naomi Feldman, a professor of linguistics with an appointment in the University of Maryland Institute for Advanced Computer Studies (UMIACS) and Shankwitz's doctoral adviser. Feldman also secured an NSF grant to provide crossover registration scholarships, reducing financial barriers and encouraging collaboration between linguists and computer scientists. Shankwitz will attend the conference as one of the scholarship recipients.

A member of UMD's Computational Linguistics and Information Processing (CLIP) Lab, Shankwitz credits Feldman's interdisciplinary expertise with helping launch her research career.

“Her research focuses on infant speech perception and cross-linguistic perception, which is a strong match for my interests,” Shankwitz said. “She was the key factor in actually being able to get it.”

Shankwitz's path to computational linguistics was anything but conventional. She originally planned to study music performance before discovering linguistics as an undergraduate at Indiana University, where she earned an interdisciplinary bachelor's degree in computational linguistics.

“I took an introductory linguistics course and thought that speech sounds were the coolest thing ever,” she said. “Finding computational linguistics let me combine computer science, math and linguistics.”

Looking ahead, Shankwitz expects researchers to increasingly use large speech models to better understand how AI processes language compared with humans. But she believes researchers will continue to learn from both large and small computational models.

Ultimately, she hopes her research will improve language-learning technologies by helping learners better perceive unfamiliar speech sounds.

If successful, her work could deepen scientists' understanding of human speech perception while informing new tools that make learning a second language more intuitive.

—Story by Diya Sharma, UMIACS communications group