
Despite significant advancements in automatic speech recognition (ASR) systems—technology that uses artificial intelligence (AI) and natural language processing to drive popular voice assistants like Siri and similar apps—large hurdles remain, particularly when it comes to gathering fair and accurate datasets upon which to train the AI models.
This problem is even more glaring for populations that can benefit the most from using these advanced technologies for education—young children ages 3 to 8.
To address this challenge, a University of Maryland expert in education policy analysis recently co-hosted an international workshop focused on improving the collection, curation and dissemination of data that is needed for ASR systems used in education.
The International Workshop on Rethinking Children’s Automatic Speech Recognition for Education—held on September 10 at the University at Buffalo—brought together more than 50 researchers, practitioners, funders, developers, and educators to foster real conversations about moving the field forward, says Jing Liu, an associate professor of education policy at the University of Maryland.
“The data problem is real and urgent,” says Liu, who directs the Center for Educational Data Science and Innovation at UMD. “We're facing a massive shortage of children's speech datasets, which creates barriers for developing accurate, fair ASR systems.
Meanwhile, Liu adds, the potential applications in EdTech and educational assessment are mind-blowing—including uses like real-time reading support, authentic language assessment, and truly adaptive learning platforms.
The daylong workshop explored these topics, and more, through a series of panels, presentations, and small group discussions.
Assisting Liu in organizing and running the workshop was Jinjun Xiong, a professor of computer science and engineering at Buffalo who also directs the National AI Institute for Exceptional Education.
Workshop attendees ultimately expect to produce a white paper to guide the field forward, as well as build a consortium of experts to collectively tackle the data shortage by developing a shared computational infrastructure.
“Children's ASR might sound niche and technical, but it's actually fundamental to the future of AI in education,” says Liu, who is an affiliate member in the University of Maryland Institute for Advanced Computer Studies (UMIACS) and a researcher at the Institute for Trustworthy AI in Law & Society (TRAILS). “When we get the technical infrastructure right—like accurately understanding children's speech—we open doors to learning experiences we can barely imagine today.”
—Story by Maria Herd, UMIACS communications group