
Unlike standalone large language models (LLMs) like ChatGPT, web AI agents can do much more than just generate text. These agents—software tools that use artificial intelligence (AI) to interact with and navigate the web, automating tasks and extracting data—can write emails, plan your day, and perform multiple other actions. But accomplishing these tasks requires access to your private data, making web AI agents highly vulnerable to jailbreaking, which involves gaining root access to your device, and other security risks like accepting malicious requests.
In a first of its kind study, a team of researchers from the University of Maryland Institute for Advanced Computer Studies (UMIACS) has identified critical factors that amplify web AI agents’ vulnerability to cyberattacks.
The lead authors of the study are computer science graduate students Jeffrey Yang Fan Chiang and Seungjae Lee. They collaborated on the paper with their advisers Jia-Bin Huang and Furong Huang, both associate professors of computer science, and Yizheng Chen, an assistant professor of computer science. All three faculty have appointments in UMIACS.
“AI companies are betting big on web AI agents,” says Furong Huang, who is also a member of the Institute for Trustworthy AI in Law & Society (TRAILS). “But as these agents become the backbone of automation, we must ensure better safeguards before it’s too late.”
The researchers conducted several tests to better understand these agents’ weaknesses, including instructing them to post insulting comments on an influencer’s Instagram posts. They found that while a standalone LLM refuses to execute such commands, web AI agents follow through with a 47% success rate.
But what the researchers really wanted to find out was why web AI agents are so susceptible to attacks. They designed a structured evaluation framework to investigate this question and successfully identified three crucial factors that drive agents’ structural weakness.
First, web AI agents directly insert task descriptions into their core programming, which increases the likelihood of jailbreaking. Second, they generate actions step by step, compounding risks and increasing the chances of harmful consequences. Third, they actively interpret complex web content and track past actions, making it harder to maintain safety constraints over time.
The new assessment framework enabled by their nuanced analysis—a five-layer structure with each level representing a different degree of harmfulness—allows for a more precise analysis of security threats, offering important insights into the underlying causes of agent vulnerabilities.
To the researchers’ knowledge, their study is the first to comprehensively and systematically investigate the underlying components that drive these security risks.
Their insights open several areas for future research, including using various frameworks and datasets to better understand risks and design more effective safety mechanisms. With the AI era showing no signs of slowing down, the researchers believe their study’s findings lay the groundwork for developing safer systems that ensure user privacy.
—Story by Aleena Haroon, UMIACS communications group