Feldman Awarded NSF Grant to Study How Children Learn Native Languages | University of Maryland Institute for Advanced Computer Studies

Naomi Feldman From the time babies are born, they face a huge learning task of how to communicate their needs. Although most children start talking by 12 months of age, their brains begin processing certain aspects of language much earlier.

Learning how infants acquire language is a challenging topic that’s been studied by philosophers and empiricists since Plato, and more recently by linguists, biologists, cognitive psychologists and others.

Now, a computational linguist in the University of Maryland Institute for Advanced Computer Studies (UMIACS) wants to use new technology to better understand how children learn their “native” language, the language an infant is most exposed to from birth.

Naomi Feldman, an assistant professor of linguistics with an appointment in UMIACS, was recently awarded a $520K grant from the National Science Foundation (NSF) to develop computational models to investigate how children develop language-specific perceptual strategies.

The simulations will test the hypothesis that children’s processing of speech can become specialized for their native language through a process of dimension learning that does not rely on knowledge of sound categories.

A better understanding of this learning process could lead to better diagnosis and treatment of developmental language impairments and can provide insight into the difficulties that listeners face when learning a second language in adulthood, Feldman says.

The research funding is part of a joint collaboration between the NSF’s Social, Behavioral and Economic Sciences Directorate and the Research Councils of the United Kingdom.

Feldman is principal investigator of the three-year project, and will work closely with Sharon Goldwater at the University of Edinburgh.

Feldman notes that using UMIACS’ computational resources will be critical to the study.

“This is entirely a computational project,” she says. “We are going to be using UMIACS computing power to train models on large speech datasets, and run simulations modeling human perceptual data.”

Feldman and Goldwater propose two-dimension learning models. The first relies on temporal information as a proxy for sound category knowledge, while the second model relies on top-down information from similar words, which infants have been shown to use. Each model is trained on speech recordings from a particular language and is evaluated on its ability to predict how adults and infants with that language background discriminate sounds.

Feldman says the research will yield new methods for training and testing cognitive models of language with naturalistic speech recordings and has the potential to significantly impact theories of how and when children learn about the sounds of their native language.

Additionally, the study could also lead to improved speech technology for low-resource languages—which are languages not spoken by many people in the world or that lack digital resources such as large-scale, annotated databases.

This could ultimately lead to systems that learn more effectively using little or no transcribed audio, Feldman says.

“These systems could become important tools for documenting and analyzing endangered and minority languages, and could help make speech technology more universally available,” she says.

—Story by Melissa Brachfeld