
Imagine asking an artificial intelligence chatbot for medical guidance, only to receive advice that could put your health at risk. Or what if a customer service bot accidently sends you someone else’s personal information?
To reduce hazards like these, software developers often add “guardrails” to AI-infused systems—installing specialized rules or filters designed to stop chatbots from producing harmful, offensive or inappropriate content.
Tech giants like Meta, Google and OpenAI have all deployed these guardrails, with most of them based on specially trained large language models that enforce a broad set of rules.
But these so-called “guardian models” are not without fault, given that they often rely on a rigid format that leaves little room for meaningful nuance. For example, a medical chatbot developer reported that AI moderation tools repeatedly flagged an ongoing anatomy discussion as sexually explicit content.
To give consumers and AI developers more flexibility—while still maintaining stringent safeguards—University of Maryland researchers have teamed up with experts at financial leader Capital One to develop novel technology that can adapt to the diverse risks of real-world AI applications.
Their system, called DynaGuard, replaces rigid safeguards with dynamic, user-defined policies tailored to specific industries and contexts. In early testing, the DynaGuard platform was able to enforce rules that other systems currently have trouble with, such as “do not reveal other users’ medical appointments.” The UMD/Capital One technology even gives chatbots a chance to self-correct before unsafe content reaches users.
“DynaGuard shows that safety doesn’t have to be one-size-fits-all,” says Tom Goldstein, a University of Maryland professor of computer science who is helping lead the project. “Teams can write the exact rules they need and get actionable feedback when a model crosses the line.”
The researchers recently published a study on their work and hope to present their findings at the Fourteenth International Conference on Learning Representations (ICLR 2026), scheduled for early next year in Rio de Janeiro.
Input from personnel at Capital One was invaluable in helping make DynaGuard applicable to industry, says Goldstein, who is the director of the University of Maryland Center for Machine Learning and a co-PI in the Institute for Trustworthy AI in Law & Society (TRAILS).
Bayan Bruss, vice president of applied AI research at Capital One and a co-author on the study, says the financial sector’s complexity makes flexible guardrails essential.
“Policies change, and you don’t want a model that needs retraining every time,” he says. “Dynamic guardrails also need to be efficient, so that they don’t slow down responses or drive-up costs.”
Other input from Capital One came from Melissa Kazemi Rad, an AI scientist manager at and tech lead of the company’s AI Foundations Guardrails team. Rad helped supervise multiple training experiments and data generation processes, and assisted in finalizing the team’s manuscript that details their work.
“Financial institutions need guardian models that can support many generative AI applications already in production,” she says. “It’s crucial that these models are reliable, fast and able to ensure safety, policy compliance, and a positive customer experience across a variety of applications without requiring constant intervention or fine-tuning.”
Rad adds that Capital One plans to leverage the insights gained throughout the development of DynaGuard to refine and expand research in other areas. These insights may help inform broader advances in customer service, risk management, and, to some extent, fraud prevention, she says.
Developing DynaGuard was not without challenges, says Monte Hoover, a fifth-year UMD computer science doctoral student and lead author of the study.
The research team initially struggled to generate a diverse and unambiguous training dataset, creating tens of thousands of policies that human reviewers often disagreed on. They solved this by curating a smaller, clearer set of partial policies and combining them in realistic business scenarios.
The team also tested numerous methods for fine-tuning large pre-trained AI models. Hoover noted that low-rank adaptation, which updates only a small set of parameters, underperformed, while newer reinforcement learning techniques proved far more effective.
To support their initial research and provide benchmarking, the team released DynaBench, a dataset of 40,000 multi-turn chatbot conversations paired with complex, custom guardrail policies. Models trained on DynaBench outperformed previous systems in enforcing both standard harmful-content rules and user-defined policies while running faster and more efficiently.
“There’s still brittleness in getting guardian models to reliably incorporate custom policies, but DynaGuard is a step toward AI systems that are safer, more trustworthy, and better suited to real-world needs,” Hoover says.
—Story by Melissa Brachfeld, UMIACS communications group
***
In addition to Tom Goldstein, Monte Hoover, Bayan Bruss and Melissa Kazemi Rad, other authors of the DynaGuard study were Vatsal Baherwani, who graduated last May with a bachelor’s degree in computer science and is now a first-year computer science doctoral student at New York University; Neel Jain, a fifth-year computer science doctoral student; Khalid Saifullah, a fourth-year computer science doctoral student; Joseph Vincent, who graduated last May with a bachelor’s degree in computer science and is now a machine learning engineer at Ciroos; Chirag Jain, a senior majoring in computer science; and Ashwinee Panda, a postdoctoral fellow at the University of Maryland Institute for Advanced Computer Studies (UMIACS).