Alan Liu, an assistant professor of computer science, has received funding from the National Science Foundation (NSF) to expand his research on telemetry and observability for large-scale AI and cloud infrastructure.
Liu, who holds an appointment in the University of Maryland Institute for Advanced Computer Studies and is a core member of the Maryland Cybersecurity Center, is principal investigator of an NSF Faculty Early Career Development Program (CAREER) award, expected to total approximately $700,000 over the next five years.
This highly competitive award–one of NSF’s most prestigious for early-career faculty–recognizes researchers with the potential to serve as academic role models and drive advances in their fields.
Liu’s project focuses on building a systems foundation needed to observe, understand, and manage complex computing infrastructure at scale. Modern AI services depend on massive networks of servers, accelerators, and software components that generate overwhelming amounts of operational data, making traditional monitoring approaches increasingly difficult to use.
The core challenge is that as AI infrastructure grows, the sheer volume of network traffic, resource bottlenecks, and system failures becomes too vast to monitor completely. Liu is placing approximation at the center of his solution, developing compact, uncertainty-aware summaries that preserve the most critical data. To make this seamless from start to finish, the project focuses on four key areas: creating smart data snapshots directly on devices, designing lightning-fast search methods to query those snapshots, using intelligent compression to save storage space over the long term, and building a management engine that automatically balances accuracy, speed and cost.
By creating these efficient data summaries, he ensures system operators can diagnose problems in real time and make swift trade-offs among accuracy, speed, and cost.
Beyond the lab, Liu is using the grant to train the next generation of researchers. By producing open-source software, documentation, and educational materials shared publicly through a dedicated project repository, the project aims to prepare students, researchers, and practitioners to manage the increasingly complex world of robust and trustworthy AI infrastructure.
—News brief adapted from an article by the Department of Computer Science
Award Information:
“Approximation-First Telemetry for Hyperscale Networked Systems” is supported by NSF grant #2544434 from the NSF’s Division of Information & Intelligent Systems
PI: Alan Liu, an assistant professor of computer science with an appointment in the University of Maryland Institute for Advanced Computer Studies.
About the CAREER award: The Faculty Early Career Development (CAREER) Program is an NSF activity that offers the foundation’s most prestigious awards in support of junior faculty who exemplify the role of teacher-scholars through outstanding research, excellent education and the integration of education and research within the context of the mission of their organization.